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Introduction 


Welcome to the PlanetMath “One Big Book” compilation, the Free Encyclopedia of Math- 
ematics. This book gathers in a single document the best of the hundreds of authors and 
thousands of other contributors from the PlanetMath.org web site, as of January 4, 2004. 
The purpose of this compilation is to help the efforts of these people reach a wider audience 
and allow the benefits of their work to be accessed in a greater breadth of situations. 


We want to emphasize is that the Free Encyclopedia of Mathematics will always be a work 
in progress. Producing a book-format encycopedia from the amorphous web of interlinked 
and multidimensionally-organized entries on PlanetMath is not easy. The print medium 
demands a linear presentation, and to boil the web site down into this format is a difficult, 
and in some ways lossy, transformation. A major part of our editorial efforts are going into 
making this transformation. We hope the organization we’ve chosen for now is useful to 
readers, and in future editions you can expect continuing improvements. 


The “linearization” of PlanetMath.org is not the only editorial task we must perform. 
Throughout the millenia, readers have come to expect a strict standard of consistency and 
correctness from print books, and we must strive to meet this standard in the PlanetMath 
Book as closely as possible. This means applying more editorial control to the book form 
of PlanetMath than is applied to the web site. We hope you will agree that there is signifi- 
cant value to be gained from unifying style, correcting errors, and filtering out not-yet-ready 
content, so we will continue to do these things. 


For more details on planned improvements to this book, see the TODO file that came with 
this archive. Remember that you can help us to improve this work by joining PlanetMath.org 
and filing corrections, adding entries, or just participating in the community. We are also 
looking for volunteers to help edit this book, or help with programming related to its pro- 
duction, or to help work on Noosphere, the PlanetMath software. To send us comments 
about the book, use the e-mail address pmbook@planetmath.org. For general comments 
and queries, use feedback@planetmath. org. 


Happy mathing, 


Joe Corneli 
Aaron Krowne 


Tuesday, January 27, 2004 
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Copyright © 2000 Free Software Foundation, Inc. 
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Everyone is permitted to copy and distribute verbatim copies of this license document, but 
changing it is not allowed. 


Preamble 


The purpose of this License is to make a manual, textbook, or other written document “free” 
in the sense of freedom: to assure everyone the effective freedom to copy and redistribute 
it, with or without modifying it, either commercially or noncommercially. Secondarily, this 
License preserves for the author and publisher a way to get credit for their work, while not 
being considered responsible for modifications made by others. 


This License is a kind of “copyleft”, which means that derivative works of the document must 
themselves be free in the same sense. It complements the GNU General Public License, which 
is a copyleft license designed for free software. 


We have designed this License in order to use it for manuals for free software, because free 
software needs free documentation: a free program should come with manuals providing the 
same freedoms that the software does. But this License is not limited to software manuals; it 
can be used for any textual work, regardless of subject matter or whether it is published as a 
printed book. We recommend this License principally for works whose purpose is instruction 
or reference. 


Applicability and Definitions 


This License applies to any manual or other work that contains a notice placed by the copy- 
right holder saying it can be distributed under the terms of this License. The “Document”, 
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below, refers to any such manual or work. Any member of the public is a licensee, and is 
addressed as “you”. 


A “Modified Version” of the Document means any work containing the Document or a 
portion of it, either copied verbatim, or with modifications and/or translated into another 
language. 


A “Secondary Section” is a named appendix or a front-matter section of the Document 
that deals exclusively with the relationship of the publishers or authors of the Document to 
the Document’s overall subject (or to related matters) and contains nothing that could fall 
directly within that overall subject. (For example, if the Document is in part a textbook 
of mathematics, a Secondary Section may not explain any mathematics.) The relationship 
could be a matter of historical connection with the subject or with related matters, or of 
legal, commercial, philosophical, ethical or political position regarding them. 


The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being 
those of Invariant Sections, in the notice that says that the Document is released under this 
License. 


The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or 
Back-Cover Texts, in the notice that says that the Document is released under this License. 


A “Transparent” copy of the Document means a machine-readable copy, represented in a 
format whose specification is available to the general public, whose contents can be viewed 
and edited directly and straightforwardly with generic text editors or (for images composed 
of pixels) generic paint programs or (for drawings) some widely available drawing editor, 
and that is suitable for input to text formatters or for automatic translation to a variety of 
formats suitable for input to text formatters. A copy made in an otherwise Transparent file 
format whose markup has been designed to thwart or discourage subsequent modification 
by readers is not Transparent. A copy that is not “Transparent” is called “Opaque”. 


Examples of suitable formats for Transparent copies include plain ASCII without markup, 
Texinfo input format, BTEX input format, SGML or XML using a publicly available DTD, 
and standard-conforming simple HTML designed for human modification. Opaque formats 
include PostScript, PDF, proprietary formats that can be read and edited only by proprietary 
word processors, SGML or XML for which the DTD and/or processing tools are not generally 
available, and the machine-generated HTML produced by some word processors for output 
purposes only. 


The “Title Page” means, for a printed book, the title page itself, plus such following pages 
as are needed to hold, legibly, the material this License requires to appear in the title page. 
For works in formats which do not have any title page as such, “Title Page” means the text 
near the most prominent appearance of the work’s title, preceding the beginning of the body 
of the text. 
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Verbatim Copying 


You may copy and distribute the Document in any medium, either commercially or non- 
commercially, provided that this License, the copyright notices, and the license notice saying 
this License applies to the Document are reproduced in all copies, and that you add no 
other conditions whatsoever to those of this License. You may not use technical measures 
to obstruct or control the reading or further copying of the copies you make or distribute. 
However, you may accept compensation in exchange for copies. If you distribute a large 
enough number of copies you must also follow the conditions in section 3. 


You may also lend copies, under the same conditions stated above, and you may publicly 
display copies. 


Copying in Quantity 


If you publish printed copies of the Document numbering more than 100, and the Document’s 
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly 
and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover 
Texts on the back cover. Both covers must also clearly and legibly identify you as the 
publisher of these copies. The front cover must present the full title with all words of the 
title equally prominent and visible. You may add other material on the covers in addition. 
Copying with changes limited to the covers, as long as they preserve the title of the Document 
and satisfy these conditions, can be treated as verbatim copying in other respects. 


If the required texts for either cover are too voluminous to fit legibly, you should put the 
first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto 
adjacent pages. 


If you publish or distribute Opaque copies of the Document numbering more than 100, you 
must either include a machine-readable Transparent copy along with each Opaque copy, or 
state in or with each Opaque copy a publicly-accessible computer-network location containing 
a complete Transparent copy of the Document, free of added material, which the general 
network-using public has access to download anonymously at no charge using public-standard 
network protocols. If you use the latter option, you must take reasonably prudent steps, when 
you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy 
will remain thus accessible at the stated location until at least one year after the last time 
you distribute an Opaque copy (directly or through your agents or retailers) of that edition 
to the public. 


It is requested, but not required, that you contact the authors of the Document well before 
redistributing any large number of copies, to give them a chance to provide you with an 
updated version of the Document. 
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Modifications 


You may copy and distribute a Modified Version of the Document under the conditions of 
sections 2 and 3 above, provided that you release the Modified Version under precisely this 
License, with the Modified Version filling the role of the Document, thus licensing distribution 
and modification of the Modified Version to whoever possesses a copy of it. In addition, you 
must do these things in the Modified Version: 


e Use in the Title Page (and on the covers, if any) a title distinct from that of the 
Document, and from those of previous versions (which should, if there were any, be 
listed in the History section of the Document). You may use the same title as a previous 
version if the original publisher of that version gives permission. 


e List on the Title Page, as authors, one or more persons or entities responsible for 
authorship of the modifications in the Modified Version, together with at least five of 
the principal authors of the Document (all of its principal authors, if it has less than 
five). 


e State on the Title page the name of the publisher of the Modified Version, as the 
publisher. 


e Preserve all the copyright notices of the Document. 


e Add an appropriate copyright notice for your modifications adjacent to the other copy- 
right notices. 


e Include, immediately after the copyright notices, a license notice giving the public 
permission to use the Modified Version under the terms of this License, in the form 
shown in the Addendum below. 


e Preserve in that license notice the full lists of Invariant Sections and required Cover 
Texts given in the Document’s license notice. 


e Include an unaltered copy of this License. 


e Preserve the section entitled “History”, and its title, and add to it an item stating 
at least the title, year, new authors, and publisher of the Modified Version as given 
on the Title Page. If there is no section entitled “History” in the Document, create 
one stating the title, year, authors, and publisher of the Document as given on its 
Title Page, then add an item describing the Modified Version as stated in the previous 
sentence. 


e Preserve the network location, if any, given in the Document for public access to 
a Transparent copy of the Document, and likewise the network locations given in the 
Document for previous versions it was based on. These may be placed in the “History” 
section. You may omit a network location for a work that was published at least four 
years before the Document itself, or if the original publisher of the version it refers to 
gives permission. 
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e In any section entitled “Acknowledgements” or “Dedications”, preserve the section’s 
title, and preserve in the section all the substance and tone of each of the contributor 
acknowledgements and/or dedications given therein. 


e Preserve all the Invariant Sections of the Document, unaltered in their text and in 
their titles. Section numbers or the equivalent are not considered part of the section 
titles. 


e Delete any section entitled “Endorsements”. Such a section may not be included in 
the Modified Version. 


e Do not retitle any existing section as “Endorsements” or to conflict in title with any 
Invariant Section. 


If the Modified Version includes new front-matter sections or appendices that qualify as 
Secondary Sections and contain no material copied from the Document, you may at your 
option designate some or all of these sections as invariant. To do this, add their titles to 
the list of Invariant Sections in the Modified Version’s license notice. These titles must be 
distinct from any other section titles. 


You may add a section entitled “Endorsements”, provided it contains nothing but endorse- 
ments of your Modified Version by various parties — for example, statements of peer review 
or that the text has been approved by an organization as the authoritative definition of a 
standard. 


You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 
words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. 
Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or 
through arrangements made by) any one entity. If the Document already includes a cover 
text for the same cover, previously added by you or by arrangement made by the same entity 
you are acting on behalf of, you may not add another; but you may replace the old one, on 
explicit permission from the previous publisher that added the old one. 


The author(s) and publisher(s) of the Document do not by this License give permission to 
use their names for publicity for or to assert or imply endorsement of any Modified Version. 


Combining Documents 


You may combine the Document with other documents released under this License, under 
the terms defined in section 4 above for modified versions, provided that you include in the 
combination all of the Invariant Sections of all of the original documents, unmodified, and 
list them all as Invariant Sections of your combined work in its license notice. 


The combined work need only contain one copy of this License, and multiple identical In- 
variant Sections may be replaced with a single copy. If there are multiple Invariant Sections 
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with the same name but different contents, make the title of each such section unique by 
adding at the end of it, in parentheses, the name of the original author or publisher of that 
section if known, or else a unique number. Make the same adjustment to the section titles 
in the list of Invariant Sections in the license notice of the combined work. 


In the combination, you must combine any sections entitled “History” in the various original 
documents, forming one section entitled “History”; likewise combine any sections entitled 
“Acknowledgements”, and any sections entitled “Dedications”. You must delete all sections 
entitled “Endorsements.” 


Collections of Documents 


You may make a collection consisting of the Document and other documents released under 
this License, and replace the individual copies of this License in the various documents with 
a single copy that is included in the collection, provided that you follow the rules of this 
License for verbatim copying of each of the documents in all other respects. 


You may extract a single document from such a collection, and distribute it individually 
under this License, provided you insert a copy of this License into the extracted document, 
and follow this License in all other respects regarding verbatim copying of that document. 


Aggregation With Independent Works 


A compilation of the Document or its derivatives with other separate and independent doc- 
uments or works, in or on a volume of a storage or distribution medium, does not as a whole 
count as a Modified Version of the Document, provided no compilation copyright is claimed 
for the compilation. Such a compilation is called an “aggregate”, and this License does not 
apply to the other self-contained works thus compiled with the Document, on account of 
their being thus compiled, if they are not themselves derivative works of the Document. 


If the Cover Text requirement of section 3 is applicable to these copies of the Document, then 
if the Document is less than one quarter of the entire aggregate, the Document’s Cover Texts 
may be placed on covers that surround only the Document within the aggregate. Otherwise 
they must appear on covers around the whole aggregate. 


Translation 


Translation is considered a kind of modification, so you may distribute translations of the 
Document under the terms of section 4. Replacing Invariant Sections with translations 
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requires special permission from their copyright holders, but you may include translations of 
some or all Invariant Sections in addition to the original versions of these Invariant Sections. 
You may include a translation of this License provided that you also include the original 
English version of this License. In case of a disagreement between the translation and the 
original English version of this License, the original English version will prevail. 


Termination 


You may not copy, modify, sublicense, or distribute the Document except as expressly pro- 
vided for under this License. Any other attempt to copy, modify, sublicense or distribute the 
Document is void, and will automatically terminate your rights under this License. However, 
parties who have received copies, or rights, from you under this License will not have their 
licenses terminated so long as such parties remain in full compliance. 


Future Revisions of This License 


The Free Software Foundation may publish new, revised versions of the GNU Free Doc- 
umentation License from time to time. Such new versions will be similar in spirit to 
the present version, but may differ in detail to address new problems or concerns. See 
http: //www.gnu.org/copyleft/. 


Each version of the License is given a distinguishing version number. If the Document 
specifies that a particular numbered version of this License ”or any later version” applies 
to it, you have the option of following the terms and conditions either of that specified 
version or of any later version that has been published (not as a draft) by the Free Software 
Foundation. If the Document does not specify a version number of this License, you may 
choose any version ever published (not as a draft) by the Free Software Foundation. 


ADDENDUM: How to use this License for your docu- 
ments 


To use this License in a document you have written, include a copy of the License in the 
document and put the following copyright and license notices just after the title page: 


Copyright © YEAR YOUR NAME. Permission is granted to copy, distribute 
and/or modify this document under the terms of the GNU Free Documenta- 
tion License, Version 1.1 or any later version published by the Free Software 
Foundation; with the Invariant Sections being LIST THEIR TITLES, with the 
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Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A 
copy of the license is included in the section entitled “GNU Free Documentation 
License”. 


If you have no Invariant Sections, write “with no Invariant Sections” instead of saying which 
ones are invariant. If you have no Front-Cover Texts, write “no Front-Cover Texts” instead 
of “Front-Cover Texts being LIST”; likewise for Back-Cover Texts. 


If your document contains nontrivial examples of program code, we recommend releasing 
these examples in parallel under your choice of free software license, such as the GNU General 
Public License, to permit their use in free software. 


lix 


Chapter 1 


UNCLA — Unclassified 


1.1 Golomb ruler 


A Golomb ruler of length n is a ruler with only alsubsetlof the/integer|markings {0, a2,--- ,n} C 
{0,1,2,...,n} that appear on a regular| ruler. The defining criterion of this subset is that 
there exists an m such that any positive integer k < m can be expresses uniquely as a 
ldifferencel k = a; — a; for some t, j. This is referred to as an m-Golomb ruler. 


A 4-Golomb ruler of length n is given by {0,1,3,7}. To verify this, we need to show that 


every number 1,2,...,7 can be expressed as a difference of two numbers in the above set: 
1=1-0 
2=3-1 
3=3-0 
4=7-3 


An optimal Golomb ruler is one where for a [fixed] value of n the value of a, is minimized. 
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1.2 Hesse configuration 


A Hesse configuration is a set P of nine non-collinear points in the projective plane over 
a (field! K such that any line through two points of P exactly three points of P. 


Then there are 12 such lines through P. A Hesse configuration exists if and only if the field 
K contains a primitive third For such K the projective automorphism jgroup) 


PGL(3, K) acts transitively on all possible Hesse configurations. 


The configuration P with its of 12 lines is {isomorphic to the affine space) 


A = F? where F is a field with three elements. 


The group T C PGL(3, K) of all symmetries) that map] P {onto itself has [order] 216 and it 
is isomorphic to the group of affine of A that have 1. The 
stabilizer] in F of any of the 12 lines through P is a [cyclic subgroup) of order three and I is 
generated by these 


The symmetry group I is isomorphic to G(K)/Z(K) where G(K) C GL(3, K) is a group 
of order 648 generated by reflections of order three and Z(K) is its of order 
three. The reflection group G(C) is called the Hesse group which appears as Gəs in the 
classification of reflection groups by Shephard and Todd. 


If K is algebraically closed and the of K is not 2 or 3 then the nine inflection 
points of an elliptic curve] E over K form a Hesse configuration. 
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1.3 Jordan’s Inequality 


Jordan’s Inequality [states] 2r < sin(x) < xz, Y z € [0, 5] 
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1.4 Lagrange’s theorem 
Lagrange’s theorem 


1: G group 
2 HLG 
3: [G : H| indexlof H in G 
A: |G| = |H|[G : H] 
Note: This is a “seed” entry written using a short-hand format described in this FAQ 
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1.5 Laurent series 


A Laurent series centered about a is a series| of the form 


oO 


` cp(z — a)" 


k=—0o 


where cx, a,z € C. 


One can prove that the above series converges everywhere inside the set 
D := {z E€ C| Rı < |z -a| < Rə} 


where 


R, := lim sup |c_,|'/* 
k—- 00 


and 
R :=1/ (timsup eal) 


k—o0o 


(This set may be empty) 


Every Laurent series has an associated [function] given by 


OO 


fe) = YD alz- a, 


k=—0o 


whose|domainiis the set of points in C on which the series converges. This function is 
inside the lannulus| D, and conversely, every analytic function on an annulus is equal to some 
(unqiue) Laurent series. 
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1.6 Lebesgue measure 


Let S C R, and let S’ be the of S with respect to R. We define S to be 
measurable if, for any A C R, 


m*(A) = m (AS) + m*(A(}S’) 


where m*(S') is the Lebesgue outer measure of S. If S is measurable, then we define the 
Lebesgue measure of S to be m(S) = m*(S). 


Lebesgue measure on R” is the n-fold of Lebesgue measure on R. 


Version: 2 Owner: vampyr Author(s): vampyr 
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1.7 Leray spectral sequence 


The Leray spectral sequence is a special case of the Grothendieck spectral sequence regarding 
composition of 


If f : X — Y is a|continuous mapļof topological spaces} and if F is a|sheafjof abelian groups| 
on X, then there is a|spectral sequence) E$ = HP(Y, RI fF) > H?t9(X, F) 


where f, is the [direct image] functor. 
Version: 1 Owner: bwebste Author(s): nerdy2 


1.8 Mobius transformation 


A Mobius transformation is a bijection) on the extended [complex plane C {co} given by 


Sty ZF —%, 00 
i@\= 4 g= 
oo z=—4 


where a,b,c,d € C and ad — bc # 0 


It can be shown that thelinverse, and composition of two mobius transformations are similarly 
defined, and so the Mobius transformations form a [group] under composition. 


The geometric interpretation) of the Mobius group is that it is the group of 
of the [Riemann sphere 


Any Mobius [map] can be composed from the elementary {transformations|- dilations, trans- 
lations and [inversions] If we define a line to be a [circle] passing through oo then it can be 
shown that a Mobius transformation maps circles to circles, by looking at each elementary 
transformation. 
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1.9 Mordell-Weil theorem 


If E is an defined over a K, then the [group] of points with 
lcoordinates| in K is a finitely generated 
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1.10 Plateau’s Problem 


The ” Plateau’s Problem” is the problem of finding the surface with minimal area among all 
surfaces wich have the same prescribed boundary. 


This problem is named after the Belgian physicist Joseph [plateau] (1801-1883) who experi- 
mented with soap films. As a matter of fact if you take a wire (which |represents|a|closed_curvel 
in three-dimensional space) and dip it in a solution of soapy water, you obtain a soapy sur- 
face which has the wire as boundary. It turns out that this surface has the minimal area 
among all surfaces with the same boundary, so the soap film is a solution to the Plateau’s 
Problem. 


Jesse Douglas (1897-1965) solved the problem by proving the existence of such minimal 
surfaces. The solution to the problem is achieved by finding an harmonic and conformal 
parameterization of the surface. 


The of the problem to higher (i.e. for k-dimensional surfaces in n- 
dimensional space) turns out to be much more difficult to study. Moreover while the solutions 
to the original problem are always regular] it turns out that the solutions to the extended 
problem may have singularities if n > 8. To solve the extended problem the of 
currents (Federer and Fleming) has been developed. 
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1.11 Poisson random variable 


X is a Poisson random variable with parameter A if 
fa) == S401 2a} 


Parameters: 


x A>O 


X ~ Poisson(A) 


Notes: 


1. X is often used to describe the ocurrence of rare levents| It’s a very commonly used 


ldistribution| in all fields) of statistics. 
2. BLX]= 
3. Var[X| = 


4. Mx(t) = &'-D 
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1.12 Shannon’s theorem 


Definition (Discrete) Let (Q, F, u) be a discrete probability space, and let X be a discrete 
random variable|on Q. 


The entropy H[X] is defined as the [functional] 


H|X] = -X M(X = 2) log u(X = z). (1.12.1) 


rE 
Definition (Continuous) Entropy in the|continuous case is called differential entropy 


Discussion—Discrete Entropy Entropy was first introduced by Shannon in 1948 in his 
landmark paper “A Mathematical Theory of Communication.” A modified and expanded 
argument of his argument is presented here. 


Suppose we have a set of possible [events] whose probabilities of occurrence are p1, p2,- -, Pn- 
These probabilities are known but that is all we know concerning which event will occur. 
Can we find a measure of how much “choice” is involved in the selection of the event or of 
how uncertain we are of theloutcome? If there is such a measure, say H (p1, p2,.--, Pn), it is 
reasonable to require of it the following 


1. H should be continuous in the p;. 


2. If all the p; are equal, p; = +, then H should be a\monotoniclincreasing function] of n. 
With equally likely events there is more choice, or uncertainty, when there are more 
possible events. 


3. If a choice be broken down into two successive choices, the original H should be the 
weighted sum of the individual values of H. 


As an example of this last property, consider losing your luggage down a chute which feeds 
three carousels, A, B and C. Assume that the baggage handling system is constructed such 
that the probability of your luggage ending up on carousel A is z, on B is E, and on C is 
5. These probabilities specify the p;. There are two ways to think about your uncertainty 
about where your luggage will end up. 

First, you could consider your uncertainty to be H(P4, Pg, Po) = Hs, 5, 5). On the other 
hand, you reason, no matter how byzantine the baggage handling system is, half the time 
your luggage will end up on carousel A and half the time it will end up on carousels B 
or C (with uncertainty H(P4, Peyc) = H(4,+4)). If it doesn’t go into A (and half the 
time it won’t), then two-thirds of the time it shows up on B and one-third of the time 
it winds up on carousel C (and your uncertainty about this second event, in isolation, is 
H (Pp, Pc) = H(%,%)). But remember this second event only happens half the time (Pgyc 
of the time), so you must weight this second uncertainty appropriately—that is, by Z. The 


uncertainties computed using each of these [chainslof reasoning must be equal. That is, 


H (Pa, Pp, Po) = H (Pa, Psyc) + PaycH (Ps, Po) 
1 1 1 1 1 1 2-1 
H|-,-,-)=H({-,- -H | =,= 
(55 5) E 5) +3 G 5) 
If you’re not as lost as your luggage, then you may be interested in the following. .. 


Theorem The only H satisfying the three above assumptions is of the form: 


H = -k Sp. log pi 
i=1 


k is a [constant], essentially a choice of [unit] of measure. The measure of uncertainty, H, is 
called entropy, not to be confused (though it often is) with Boltzmann’s thermodynamic 
entropy. The [logarithm] may be taken to the [base] 2, in which case H is measured in “bits,” 
or to the base e, in which case H is measured in “nats.” 


Discussion—Continuous Entropy Despite its seductively analogous form, continuous 
entropy cannot be obtained as a limiting case of discrete entropy. 


We wish to obtain a generally [finite] measure as the “bin size” goes to zero. In the discrete 
case, the bin|sizelis the (implicit) width of each of the n (finite or infinite) bins/buckets/states 
whose probabilities are the p,. As we generalize to the continuous we must make 
this width explicit. 


To do this, start with a continuous function f discretized as shown in the figure: 


As the figure indicates, by the |mean-value theorem] there exists a value x; in each bin such 


Figure 1.1: Discretizing the function f into bins of width A 


that l 
f(x)A = int f(x)dx (i129) 


and thus the [integral] of the function) f can be approximated (in the Riemannian sense) by 


int® d= lim 3 f(x) A (1.12.3) 


1=— CO 


where this [limitland “bin size goes to zero” are [equivalent] 
We will denote ae 
H^ = — X Af(a,) log Af (zi) (1.12.4) 


1=—co 


and expanding the log we have 


fos > Af (x;) log Af (x;) (1.12.5) 
=— X A f(x) log f(a) — X f(z) Alog A. (1.12.6) 
As A — 0, we have 
> f(z) A > intf(x)dx = 1 and (1.12.7) 
Ds Af («;) log f(a;) > intf (x) log f(x)dx (1.12.8) 
This leads us to our definition of the {differential entropy] (continuous entropy): 
h(f] = lim [HÂ + log A] = —int®,, f(x) log f(x)dx. (1.12.9) 
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1.13 Shapiro inequality 


Let n > 3 positive |reals) x1, £2, . . . , En ER,. 
The following 


Tı T2 Tn 


n 
+ Se 2 = 
Lı F22 Tə + T3 £1 + £2 2 


with z£; + 241 > 0 is true for any [even integer|n < 12 and any [odd integer|n < 23. 
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1.14 Sylow p-subgroups 


Let G be aļfinite group|and p be a/prime|that [divides] |G|. We can then write |G| = p*m for 
some positive [integer] k so that p does not divide m. 


Any of H whose \order| is p* is called a Sylow p-subgroup or simply p subgroup. 
First that any group] with order p*m has a Sylow p-subgroup. 
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1.15 Tchirnhaus transformations 


A which transforms a polynomial to another with certain zero- 
coefficients is called a Tschirnhaus Transformation. It is thus an invertible transforma- 
tion of the form x + g(x)/h(x) where g,h are polynomials over the /basel/field) K (or some 
subfield] of the [splitting field) of the polynomial being transformed). If ged(D(x), f(x)) = 1 


then the Tschirnhaus transformation becomes a polynomial transformation mod f. 


Specifically, it concerns a substitution that reduces finding the \roots| of the polynomial 


p =T" +a T! +... + an = [T — r;) ERT] 


i=1 


to finding the roots of another q - with less parameters - and solving an auxiliary polynomial 
equation s, with deg(s) < deg(p Nq). 


Historically, the transformation was applied to reduce the general quintic equation, to simpler 


Examples due to Hermite and Klein are respectively: The resolvent 
K(X) := XŽ + aX? +aıX + az 


9 


and the Bring-Jerrard form 
K(X) — x? +a,X + ag 


Tschirnhaus transformations are also used when computing|Galois groups|to remove repeated 
roots in resolvent polynomials. Almost any transformation will work but it is extremely hard 


to find an efficient algorithm that can be proved to work. 
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1.16 Wallis formulae 


z I reer 2n— 1 
intë sin?” rdg = C 


24....2n 2 
x 2.4.....2n 
: t2 .  2n+1 d Z 
a aa eT 
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1.17 ascending chain condition 


A S of [subsets] of a set X (that is, a subset of the of X) \satisfies 
the ascending chain condition or ACC if there does not exist an ascending 
Sı C S2 C ::: of subsets from S. 


See also the descending chain condition] (DCC), 
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1.18 bounded 


Let X be alsubset! of R. We say that X is bounded when there exists afreal number M such 
that |z| < M for all x € M. When X is anlinterval| we speak of an bounded interval. 


This can be generalized first to R”. We say that X C R” is bounded if there is a real number 
M such that ||x|| < M for all x € M and ||- || is the Euclidean distance between x and y. 
When we consider we speak of bounded balls 
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This condition is to the statement: There is a real number T such that ||a—y|| < T 
for all x,y E€ X. 


A further generalization to any metric space] V says that X C V is bounded when there is a 
real number M such that d(x,y) < M for all x,y € X and d{represents| the [metric] 
on V. 
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1.19 bounded operator 
Definition [i] 


1. Suppose X and Y are with l|-\|x and ||-I|y. Further, 
suppose T is a linear map| T : X — Y. If there is a C > 0 such that 


lTeļy < |lallx 


for all x € X, then T is a bounded operator. 


2. Let X and Y be as above, and let T : X — Y is a bounded operator. Then the norm 
of T is defined as the real number] 


lTz|ly 
IT| = sup{ 
|zl|x 


In the special case when X is the zero vector space, any linear map T : X — Y is the 
since T(0) = OT (0) = 0. In this case, we define ||T|| = 0. 


|£ € X \ {OF}. 


TODO: 


1. The defined norm for is a norm 


. Examples: identity operator; [zero operator} see [I]. 
3. Give alternative expressions for norm of T. (supremum taken over [unitlball} 


N 


4. Discuss boundedness and continuity 


Theorem [I| B] Suppose T : X — Y is a linear map between [vector spaces| X and Y. If X 
is finite dimensional, then T is bounded 
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Note: This is a “seed” entry written using a short-hand format described in this FAQ 
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1.20 complex projective line 


complex projective line 


1: (21, 22) complex numbers) 

2: (z1, 22) Æ (0,0) 

3: VA EC \ {0}: (Azu Aza) ~ (21, 22) 
4: {(Az, Azz) | A € C \ {0}}/~ 


Note: This is a “seed” entry written using a short-hand format described inthis FAQ, 
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1.21 converges uniformly 


Let X be a set, (Y, p) a [metric space and {f,,} a Sequence) of [functions] from X to Y, and 
f :X — Y another function. 
If for any € > 0 there exists an [integer] N such that 

Afn(z), F(x)) < € 


for alln > N we say that f, converges unformly to f. 
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1.22 descending chain condition 


A S of [subsets] of a set X (that is, a subset of the of X) 
the descending chain condition or DCC if there does not exist an \infinite) descending [chain] 
81 D S2 D--- of subsets from S. 


See also the (ACC). 
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1.23 diamond theorem 


In the simplest case, the result|states|that every |image of a two-colored ” Diamond) figure (like 


the figure in Plato’s Meno dialogue) under the action of the [symmetric group) of [degree] 4 has 
some ordinary or color-interchange The theorem generalizes to graphic designs 
on 2x2x2, 4x4, and 4x4x4 arrays. It is of interest because it relates classical 


symmetries to underlying that come from rather than from classical 


geometry, The group actions in the 4x4 case of the theorem throw some light on the R. T. 


Curtis ” miracle octad |generator|’ approach to the large Mathieu [group] 
Version: 2 Owner: m759 Author(s): m759 


1.24 equivalently oriented bases 
equivalently oriented bases 


1: V finite-dimensional vector space 
2: (v1,..-,Un) ordered for V 


3: (w1,..., Wn) ordered basis for V 


fact: there is a unique linear isomorphism taking a given basis to 
another given basis 
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1: V finite-dimensional vector space 


2: (U1,..-,Un) ordered basis for V 
3: (W1,..-,Wn) ordered basis for V 
4: JA:V >V linear [isomorphism] : Vi E€ {1,... n} : Avi = w; 


Note: This is a “seed” entry written using a short-hand format described in\this FAQ, 
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1.25 finitely generated R-module 


finitely generated R-module 


1: X (module| over R 
2: YCX 
3: X generated by Y 
4: Y 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 
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1.26 fraction 


A fraction is a [rational numberl expressed in the form “orn /d, where n is designated the 
numerator and d the denominator. The slash between them is known as a solidus when 
the fraction is expressed as n/d. 


The fraction n/d has value n | d. For instance, 3/2 = 3 | 2 = 1.5. 


If n/d < 1, then n/d is known as a proper fraction. Otherwise, it is an improper 


fraction. If n and d are relatively prime, then n/d is said to be in lowest terms. To 
get a fraction in lowest terms, simply the numerator and the denominator by their 


60 60| 12 


E 
84 84|12 7 
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The rules for manipulating fractions are 


a 2 ka 

b 7 kb 
aie E ad + bc 
b d E bd 
a c B ad — bc 
b d g bd 
ME T ee 
b d bd 
aie B ad 
b d E be 
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1.27 group of covering transformations 


group of covering transformations 


1: ({h: X > X | hcovering transformation}, o) 


Note: This is a “seed” entry written using a short-hand format described in[this FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.28 idempotent 


idempotent 


1: R ring 
2 reR 


3: r? =r 


The following facts hold in commutative! rings. 
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fact: if r is idempotent, then 1 — r is idempotent 


1: R ring 
2:reER 
3: r idempotent 


4: 1 —r idempotent 


fact: if r is idempotent, then rR is a ring 


1: R ring 
2:reER 
3: r idempotent 


4: rR is a ring 


fact: if r is idempotent, then rR has identity r 


1: R ring 
2:reER 
3: r idempotent 


4: YsErR:rs=sr=s 


fact: if r is idempotent, then R=rRx (1—r)R 


1: R ring 
2:rER 
3: r idempotent 
4: RSrRx(1-r)R 
Note: This is a “seed” entry written using a short-hand format described in\this FAQ, 


Version: 3 Owner: bwebste Author(s): apmxi 
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1.29 isolated 


Let X be a topological space) let S C X, and let x € S. The point x is said to be an isolated 
point of S if there exists an {open set] U C X such that U S = {x}. 


The set S is isolated if every point in S is an isolated point. 


Version: 1 Owner: djao Author(s): djao 


1.30 isolated singularity 


isolated singularity 


1: f:UcCC-CLJ{oo} 
2: z EU 


3: f lanalytic|on U \ {zo} 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: bwebste Author(s): apmxi 


1.31 isomorphic groups 


isomorphic groups 


1: (X1, *1), (X2, *2) 
2: f: Xı + Xo isomorphism] 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: Thomas Heye Author(s): apmxi 
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1.32 joint continuous density function 


Let X1, Xo, ..., Xn ben all defined on the same probability space, The joint 


continuous density function of X1, X2, ..., Xn, denoted by fx, x,,...x, (£1, £2, ---, Zn), is the 


n 
unction 
Ti ake R” > R 
= (@1,02,...,0 ) = 
such that iii a ed an u2, ..., Un)durduz...dun = Py, 5. (91; ay Wa) 


As in the case where n = 1, this function |satisfies! 


1. bee cee (£1, indi) > 0 V(x, aan) 


2. intai enf oe Cee es (ur, U2, eeg Un \duiduz...dun =] 


As in the single variable case, fx,,x2,..,x„, does not [represent] the probability that each of the 
random variables takes on each of the values. 


Version: 4 Owner: Riemann Author(s): Riemann 


1.33 joint cumulative distribution function 


Let X1, Xə, ..., Xn be nrandom variables)all defined on the same|probability space, The joint 


cumulative distribution function of X4, X9,...,X,, denoted by Fx, x,,...x, (1, V2, ---, En); 


is the following |function} 


Px, Xan Xn i RY > R 
Fy, X2... Xn (£1, £2; sarin) = PIX, < £1, Xo < T2, sAn. < TN 


As in the unidimensional case, this function satisfies 


1. Mensa eee Fyi, X2, Xn (x1, g Tn) = 0 and ha ee a ae en Fyi, Kaya Xn (x1, a) Tn) = 


1 


2. Fy, X2....,Xn (£1, ++; Ln) is a[monotone], nondecreasing function. 
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3. Fx x5,....X, (£1; ++; Zn) is [continuous] from the right in each variable. 


The way to evaluate Fy; x5,...x,(%1,-.-;2n) is the following: 

Fy, X2... Xn (C1, +++) Zn) = Int int™,,- int? fX, X2,...Xn (U1, +> Un) durdug --- dun 
(if F is continuous) or 

Pg oa (Ogg En) = Dn Si che Sees JX Ha, yea) 


(if F is discrete), 


where fx,,x5,....x, 18 the joint density function of X1,...,Xn. 


Version: 3 Owner: Riemann Author(s): Riemann 


1.34 joint discrete density function 


Let X4, X, ..., Xn be n {random variables) all defined on the same [probability space) The 


joint discrete density function of X1, Xo,..., Xn, denoted by fx, x,,...x,,(@1, 2, -.-, Un), is 


the following 


icon cme : R” OR 
Í xi X2, Xn (T1; T2, > En) = P[Xy = 11, X2 = 12, ..., Xn = Tr] 


As in the single variable case, sometimes it’s expressed as Px, X2,..,Xn (£1, 2, ---, Zn) to mark 


the [difference] between this function and the joint density function. 
Also, as in the case where n = 1, this function satisfies! 


1; (ce ee (£1, dainty.) > 0 V(x, san) 


2. Paian [XX2 Xn (T1 En) = 1 


In this case, fx, X2, Xn (£1; ning En) = P[X1 = 11, Xo = T2, sg Xn = Tn). 
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Version: 3 Owner: Riemann Author(s): Riemann 


1.35 left function notation 


We are said to be using left function notation if we write to the left of their 
arguments. That is, if a: X — Y is a function and x € X, then az is the image) of x under 
Q. 


Furthermore, if we have a function 8 : Y — Z, then we write the composition of the two 
functions as Ga : X — Z, and the image of x under the composition as Bar = (Ga)xr = 


Blax). 
Compare this to right function notation 


Version: 1 Owner: antizeus Author(s): antizeus 


1.36 lift of a submanifold 
lift of a submanifold 


1: X,Y topological [manifolds] 
: Z C Y submanifold 

: g : Z > Y linclusion] 

: @ lift) of g 

i(9) 


ao e wo N 


Note: This is a “seed” entry written using a short-hand format described inthis FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.37 limit of a real function exits at a point 


Let X C R be an open set] of [real numbers) and f: X — R alfunction| 
If £o € X, we say that f is[continuouslat xo if for any £ > 0 there exists ô positive such that 
|f (x) — f(zo)| < € 
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whenever 
z — To| < ô. 


Based on apmé 


Version: 2 Owner: drini Author(s): drini, apmxi 


1.38  lipschitz function 


lipschitz function 


1f:R-C 


2: IM ER: Vz,y ER: |f(z) — f(y)| < Mlx—y| 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: bwebste Author(s): apmxi 


1.39 lognormal random variable 


X is a Lognormal random variable with parameters and o? if 


fx(2) = zr" —_, > 0 


Parameters: 


x HER 
x o > 


X ~ LogN(u,0?) 
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Notes: 


1. X is a\random variable such that In(X) is anormal random variable with (mean) and 
variance g?. 


2. BR] na 


2 


3. Var[X] = e+? (e” — 1) 


4. Mx(t) not useful 


Version: 2 Owner: Riemann Author(s): Riemann 


1.40 lowest upper bound 


Let S be a set with an [ordering relation] <, and let T be alsubsetlof S. A lowest upper bound 
of T is an [upper bound) x of T with the |property that x < y for every upper bound y of T. 


A lowest upper bound of T, when it exists, is unique. 


Greatest lower bound is defined similarly: a greatest lower bound of T is allower bound) x of 
T with the property that x > y for every lower bound y of T. 


Version: 3 Owner: djao Author(s): djao 
1.41 marginal distribution 


Given [random variables) X, X2, ..., Xn and alsubset| 7 C {1,2,...,n}, the marginal distri- 


bution of the random variables X; : i € I is the following: 


foxes (X) = eh ÍX, Xn (£1, -3 En) OF 
fexuer (x) = It pte f XXn (u1, seag Un) Liungen dui, 


summing if the variables are discrete and integrating if the variables are [continuous] 


This is, the marginal distribution of a set of random variables X4, ..., Xn can be obtained by 


summing (or integrating) the over all values of the other variables. 
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The most common marginal distribution is the individual marginal distribution (ie, the 
marginal distribution of ONE random variable). 


Version: 4 Owner: Riemann Author(s): Riemann 


1.42 measurable space 


A measurable space is a set E together with a \collection| B(E) of subsets! of E which is a 
sigma algebra 


The elements of B(E) are called [measurable sets) 


Version: 3 Owner: djao Author(s): djao 


1.43 measure zero 


measure zero 


1: (X, M, p) measure space] 
2 AEM 


3: w(A) =0 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: bwebste Author(s): apmxi 


1.44 minimum spanning tree 


Given a G with weighted [edges] a minimum spanning tree is a with 


minimum weight, where the weight of a spanning tree is the sum of the weights of its edges. 
There may be more than one minimum spanning tree for a graph, since it is the weight of 
the spanning tree that must be minimum. 


For example, here is a graph G of weighted edges and a minimum spanning tree T for that 
graph. The edges of T are drawn as solid lines, while edges in G but not in T are drawn as 
dotted lines. 
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Prim’s algorithm or Kruskal’s algorithm can compute the minimum spanning tree of a graph. 


Version: 3 Owner: Logan Author(s): Logan 


1.45 minimum weighted path length 


Given a list of weights, W := {w1, wo,...,Wn}, the minimum weighted path length is the 


minimum of the weighted path length of all extended binary trees that have n/external nodes} 


with weights taken from W. There may be multiple possible trees that give this minimum 
path length, and quite often finding this tree) is more important than determining the path 
length. 


Example 


Let W := {1, 2,3,3,4}. The minimum weighted path length is 29. A tree that gives this 
weighted path length is shown below. 


Applications 


Constructing a tree of minimum weighted path length for a given set of weights has several 
applications, particularly dealing with optimization problems. A and elegant algo- 


rithm for constructing such a tree is Such a tree can give the most 
optimal algorithm for merging n sorted /sequences] (optimal merge). It can also provide a 


means of compressing data (Huffman coding), as well as lead to optimal searches. 


Version: 2 Owner: Logan Author(s): Logan 
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1.46 mod 2 intersection number 


mod 2 intersection number 


case: transversal map 


1: X 

2: X (compact 

3: Y smooth manifold 

4: Z c Y \closed| submanifold 

5: f: X — Y [smooth] 

6: Z and X have 
7: f transversal] to Z 

8: |f-4(Z)| (mod#1) 


case: nontransversal map 


1: X smooth manifold 

2: X compact 

3: Y smooth manifold 

4: Z C Y closed submanifold 
5: f: X — Y smooth 

6: dim(X) + dim(Z) = dim(Y) 
7: g homotopic) to f 

8: g transversal to Z 


9: |g-(Z)| (mod#1) 
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fact: a homotopic transversal map exists 


1: X smooth manifold 

2: X compact 

3: Y smooth manifold 

4: Z C Y closed submanifold 
5: f: X — Y smooth 


6: dim(X) + dim(Z) = dim(Y) 


7: dg homotopic to f : g transversal to Z 


fact: two homotopic transversal maps have the same mod 2 inter- 
section number 


1: X smooth manifold 

2: X compact 

3: Y smooth manifold 

4: Z C Y closed submanifold 
5: fi, fo: X — Y smooth 

6: fı homotopic to fə 


Ts In(fi, Z) = Io( fa, Z) 


fact: boundary theorem 


1: X manifold with boundary 

2: Y manifold 

3: Z CY submanifold 

4: Z and OX have complementary dimension 


5: g: 0X >Y 
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6: g can be extended to X 
T: Ig, 2). =0 


Note: This is a “seed” entry written using a short-hand format described in|this FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.47 moment generating function 


Given a X, the moment generating function of X is the following 


Mx(t) = Ele'*] for t € R (if the [expectation] converges). 


It can be shown that if the moment generating function of X is defined on an|interval| around 
the origin, then 


E[X*] = MY? th= 


In other words, the kth-derivative of the moment generating function evaluated at zero is 
the kth{moment] of X. 


Version: 1 Owner: Riemann Author(s): Riemann 


1.48 monoid 


A monoid is a G which an identity element; that is, there exists an 


element e € G such that e-a=a-e=a forala EG. 


Version: 1 Owner: djao Author(s): djao 


1.49 monotonic operator 


For a|poset| X, an [operator] T is a monotonic operator if for all x,y € X, x < y implies 
T(x) < T(y). 


Version: 1 Owner: Logan Author(s): Logan 
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1.50 multidimensional Gaussian integral 


Let N(0, K) be an unnormalized multidimensional with mean 0 and 
matrix) K, Ki; = cov(2;,7;). K is symmetric] by the cov(x;, £i) = cov(x;,2;). Let 


x = [£1 T2 ... In|" and dx = [[_, dzn. 
It is easy to see that N(0, K) = exp (—4x"K~tx). How can we normalize N(0, K)? 


We can show that 


pep a = ((27)"|K|)? (1.50.1) 
where |K| = det K. 


K~! is[realland symmetric (since (K~')? = (KT)~1 = K~!). For convenience, let A = K™!. 
We can decompose A into A = TAT™!, where T is an orthonormal (TTT = I) matrix of 
theleigenvectors|of A and A is aldiagonal matris] of the eigenvalues of A. ‘Then 


foc febteaxa foo fer mera (1.50.2) 


Because T is orthonormal, we have T~! = TT. Now define a new [vector] variable y © TTx, 
and substitute: 


Papa a n e aaa (1.50.3) 
= f fee ley (1.50.4) 


(1.50.5) 


where |J] is the determinant] of the Jacobian matrix! Jmn = ing In this case, J = T and 
thus |J| = 1. 


Now we’re in business, because A is diagonal and thus the integral] may be [separated] into 
the product of n independent] Gaussians, each of which we can integrate separately using the 
well-known i 

i — lat? 27 \? 

inte 3" dt = (=) : (1.50.6) 


Carrying out this program, we get 
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fo fey = ] [inte dy, (1.50.7) 
= (=) (1.50.8) 
Àk 
1 
Qn)” \3 
(2a) ) (1.50.9) 


a) ‘ (1.50.10) 


(1.50.11) 


Now, we have |A| = |TAT~| = |T||A||T~*| = |T] = |A||T|~1 = |A|, so this becomes 


fo fetta e eae (1.50.12) 


Substituting back in for K~!, we get 


f ; ERT = Ger n ((2m)"|K|)? , (1.50.13) 


Version: 4 Owner: drini Author(s): drini, drummond 


as promised. 


1.51 multiindex 


multiindex 


Let n € N. Then a element a € N” is called a multiindex 


Version: 2 Owner: mike Author(s): mike, apmxi 
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1.52 near operators 


1.52.1 Perturbations and small perturbations: definitions and some 
results 


We start our discussion on the Campanato theory of near operators with some preliminary 
tools. 


Let X,Y be two sets and let a{metric|d be defined on Y. If F : X — Y is an |injective\map, 


we can define a metric on X by putting: 


dele 2 )= dF (x), Fe’). 


Indeed, dp is zero if and only if z’ = x” (since F is injective); dp is and 
the triangle inequality follows from the triangle inequality of d. 


If moreover F(X) is a complete|subspace) of Y, then X is complete wrt the metric dp. 
Indeed, let (un) be a Cauchy sequence in X. By definition of d, then (F(un)) is a Cauchy 


sequence in Y, and in particular in F(X), which is complete. Thus, there exists yo = 
F (ao) € F(X) which is limit] of the Sequence] (F(un)). xo is the limit of (£n) in (X, dr), 
which completes the proof. 


A particular case of the previous statement is when F is (and thus a bijection) and 
(Y, d) is complete. 


Similarly, if F(X) is|compact|in Y, then X is compat with the metric dp. 


Definition 1. Let X be a set and Y be a|metric space, Let F,G be two maps from X to 
Y. We say that G is a perturbation of F if there exist alconstant) k > 0 such that for each 
x', x" € X one has: 

d(G(x'), G(x")) < kd(F(2’), F(x") 


remark 1. In particular, if F is injective then G is a perturbation of F if G is uniformly continuous 
wrt to the metric linducedlon X by F. 


Definition 2. In the same hypothesis as in the previous definition, we say that G is a small 
perturbation of F if it is a perturbation of constant k < 1. 


We can now prove this generalization of the Banach-Caccioppoli fixed point theorem 


Theorem 1. Let X be a set and (Y,d) be a complete metric space. Let F,G be two mappings 
from X to Y such that: 


1. F is bijective; 
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2. G is a small perturbation of F. 


Then, there exists a unique u E€ X such that G(u) = F(u) 


T he hypothesis () ensures that the metric space (X,dr) is complete. If we now consider 


the [function] T : X — X defined by 


we note that, by @), we have 
d(G(x'), G(2")) < kd(F(2’), F(z") 


where k € (0,1) is the constant of the small perturbation; note that, by the definition of dp 
and applying F o F~ to the first side, the last equation can be rewritten as 


dp(T(x'),T(x")) < kdp(z', x"); 


in other words, since k < 1, T is a\contraction in the complete metric space (X, dp); therefore 
(by the classical Banach-Caccioppoli fixed point theorem) T has a unique fixed point; there 
eristu € X such that T(u) = u; by definition of T this is|equivalent to G(u) = F(u), and 


the proof is hence complete. 


remark 2. The hypothesis of the theorem can be generalized as such: let X be a set and 
Y a metric space (not necessarily complete); let F, Œ be two mappings from X to Y such 
that F is injective, F(X) is complete and G(X) C F(X); then there exists u € X such that 
Glu) = F(u). 


(Apply the theorem using F(X) instead of Y as target space.) 


remark 3. The Banach-Caccioppoli fixed point theorem is obtained when X = Y and F is 
the identity 


We can use theorem[]]to prove a result that applies to perturbations which are not necessarily 
small (i.e. for which the constant k can be greater than one). To prove it, we must assume 
some supplemental |structure| on the metric of Y: in particular, we have to assume that the 
metric d is invariant! by dilations, that is that d(ay’, ay”) = ad(y', y") for each y’, y” € Y. 
The most common case of such a metric is when the metric is deduced from afmorm (i.e. when 


Y is alnormed space] and in particular a|Banach space). The result follows immediately: 


Corollary 1. Let X be a set and (Y,d) be a complete metric space with a metric d invariant 
by dilations. Let F,G be two mappings from X to Y such that F is bijective and G is a 
perturbation of F, with constant K > 0. 


Then, for each M > K there exists a unique uy € X such that G(u) = MF(u) 


T he proof is an immediate consequence of theorem [O] given that the map G(u) = G(u)/M 
is a small perturbation of F (a which is ensured by the dilation invariance of the 
metric d). 
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We also have the following 


Corollary 2. Let X be a set and (Y,d) be a complete, compact metric space with a metric 
d invariant by dilations. Let F,G be two mappings from X to Y such that F is bijective and 
G is a perturbation of F, with constant K > 0. 


Then there exists at least one ux E€ X such that Glu») = KF (us) 


L et (an) be a decreasing sequence of \real numbers| greater than one, converging to one 
(an | 1) and let M,, =a,K for eachn € N. We can apply corollary[] to each M,,, obtaining 
a sequence un of elements of X for which one has 


Glun) = My F (un). (1.52.1) 


Since (X,dr) is compact, there exist a [subsequence] of Un which converges to some Ug; by 
continuity of G and F we can pass to the limit in (52.1), obtaining 


Glu») = AL Ass) 
which completes the proof. 


remark 4. For theorem P] we cannot ensure uniqueness of us, since in general the sequence 
Un may change with the choice of an, and the limit might be different. So the corollary can 
only be applied as an existence theorem. 


1.52.2 Near operators 


We can now introduce the concept of near operators and discuss some of their properties. 


A historical remark: Campanato initially introduced the concept in subse- 
quently, it was remarked that most of the |theory| could more generally be applied to Banach 
spaces; indeed, it was also proven that the basic definition can be generalized to make part 
of the theory available in the more general environment of metric 


We will here discuss the theory in the case of Banach spaces, with only a couple of exceptions: 
to see some of the extra properties that are available in Hilbert spaces and to discuss a 
generalization of the Lax-Milgram theorem to metric vector spaces. 


1.52.3 Basic definitions and properties 


Definition 3. Let X be a set and Y a Banach space. Let A, B be two [operators] from X 
to Y. We say that A is near B if and only if there exist two constants a > 0 and k € (0,1) 
such that, for each x’, x” € X one has 


|| B(x!) — Bla") — af A(x’) — A(x"))|| < k |B’) — Bix") | 


32 


In other words, A is near B if B — aA is a small perturbation of B for an appropriate value 
of a. 


Observe that in general the property is not symmetric: if A is near B, it is not necessarily 
true that B is near A; as we will briefly see, this can only be proven if a < 1/2, or in the 
case that Y is a Hilbert space, by using an equivalent condition that will be discussed later 
on. Yet it is possible to define a topology] with some interesting properties on the space of 
operators, by using the concept of nearness to form a|basel 


The core point of the nearness between operators is that it allows us to “transfer” many 
important properties from B to A; in other words, if B satisfies) certain properties, and A is 
near B, then A satisfies the same properties. To prove this, and tolenumerate| some of these 
“nearness-invariant” properties, we will emerge a few important facts. 


In what follows, unless differently specified, we will always assume that X is a set, Y is a 
Banach space and A, B are two operators from X to Y. 


Lemma 1. If A is near B then there exist two positive constants Mı, My such that 


| B(x") — B(a")|| < M |A) — A(x”) 
A(x’) — A(@") || < M2 ||B(2’) — B(x") | 


W e have: 


B(x) — Bix") < 
< |B’) — Bia") — a A(x‘) — A(a"))|| + a ||A(@’) — Ala") < 
< k||B(a’) — Bla") || + a |A’) — A(a") || 


and hence 
Q 


1-k 
which is the first \inequality with Mı = a/(1 — k) (which is positive since k < 1). 


|B) — B(a")|| < lA’) — A(a")|| 


But also 
Ae) = AI < 
< = ||B(0") ~ Ble") - a A(e') — A(e")) || + = 1B(@) - Bele 
< Ë Ble!) — Ble") + = |B") - Be") 


and hence igi 
|| A(z’) — A(x") || < < = |1B(') - B(x")|| 


which is the second inequality with M = (1+ k)/a. 
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The most important corollary of the previous {lemmalis the following 


Corollary 3. If A is near B then two points of X have the same|image| under A if and only 
if the have the same image under B. 


We can express the previous concept in the following formal way: for each y in B(X) there 
exist z in Y such that A(B~+(y)) = {z} and conversely. In yet other words: each [fiber] of A 
is a fiber (for a different point) of B, and conversely. 


It is therefore possible to define a map T4 : B(X) — Y by putting Ty(y) = z; the range of 
T4 is A(X). Conversely, it is possible to define Tg : A(X) —> Y, by putting T(z) = y; the 
range of Tg is B(X). Both maps are injective and, if to their respective ranges, 
one is the inverse) of the other. 


Also observe that Tg and T4 are{continuous| This follows from the fact that for each z € X 
one has 
Ta(B(x)) = A(z),  Te(A(x)) = B(x) 


and that the lemma ensures that given a sequence (xn) in X, the sequence (B(z,,)) converges 
to B(x) if and only if (A(z,,)) converges to A(z9). 


We can now list some invariant properties of operators with respect ot nearness. The prop- 
erties are given in the form “if and only if” because each operator is near itself (therefore 
ensuring the “only if” part). 

1. a map is injective iff it is near an injective operator; 

2. a map is surjective iff it is near a surjective operator; 

3. a map is [open] iff it is near an open map; 


4. a map has|densel range iff it is near a map with dense range. 


To prove (2) it is necessary to use theorem [I] 


Another important property that follows from the lemma is that if there exist y € Y such that 
A“l(y)(.\Bo'(y) 4 Ø, then it is A~'(y) = B~! (y): intersecting fibers are equal. (Campanato 
only stated this property for the case y = 0 and called it “‘the kernel property”; I prefer to 
call it the “fiber persistence” property.) 


A topology based on nearness 


In this [section] we will show that the concept of nearness between operator can indeed be 
connected! to a topological understanding of the set of maps from X to Y. 
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Let M be the set of maps between X and Y. For each F € M and for each k € (0,1) we let 
U,(F) the set of all maps Œ € M such that F — G is a small perturbation of F with constant 
k. In other words, G € U;(F’) iff G is near F with constants 1, k. 


The set U(F) = {U,(F) | 0 < k < 1} satisfies the [axioms] of the set of fundamental 
Indeed: 

1. F belongs to each U;(F); 

2. Ux(F) C Un(F) iff k < h, and thus the [intersection] property of neighbourhoods is 


trivial; 
3. for each U;(F'’) there exist U;,(F) such that for each G € U,(F) there exist U;(G) C 
U,(F). 
This last property (permanence of neighbourhoods) is somewhat less trivial, so we shall now 
prove it. 
L et U;,(F) be given. 


Let U (F) be another arbitrary neighbourhood of F and let G be an arbitrary element in it. 
We then have: 


[F (2) — F(a") — (Ga) — Ga") < h |F) — F"). (1.52.2) 

but also (lemma [I 
IG) = Ga") || < A h) lE) — Fa. (1.52.3) 
Let also U;(G) be an arbitrary neighbourhood of G and H an arbitrary element in it. We 


then have: 
|G(2’) — G(2") — (A(2’) — H(2"))|| < 9 a E (1.52.4) 


The nearness between F and H is calculated as such: 


IF (2!) — F(a") - (A(2’) - H") < 
IE) — F(a") — (G) — Ga") + GE) — Ga") — (2) — Ha”) < 


h||F(2!) — Fe") || +9 1G(2’) — G@")|| < (h +50 +2) | F(e’!) — F(@")|]. (1.52.5) 
We then want h+j(1+h) < k, that is 7 < (k—h)/(1+h); the condition 0 < j < 1 is always 
satisfied on the right side, and the left side gives us h < k. 


It is important to observe that the topology generated this way is not a Hausdorff topology: 
indeed, it is not possible to separate F and F +y (where F € M and y is a constant element 
of Y). On the other hand, the [subset] of all maps with with a [fixed] valued at a fixed point 
(F'(xo) = yo) is a Hausdorff subspace. 
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Another important [characteristic] of the topology is that the set H of invertible operators 
from X to Y is open in M (because a map is invertible iff it is near an invertible map). This 


is not true in the topology of as is easily seen by choosing X = Y = 


R 


and the sequence with [generic] element F,,(x) = x? — z/n: the sequence converges (in the 


uniform convergence topology) to F(x) = x, which is invertible, but none of the Fp 
invertible. Hence F is an element of H which is not inside KH, and H is not open. 


1.52.4 Some applications 


is 


As we mentioned in the introduction, the Campanato theory of near operators allows us 
to generalize some important theorems; we will now present some generalizations of the 


Lax-Milgram theorem, and a generalization of the Riesz theorem. 
[TODO] 


Version: 5 Owner: Oblomov Author(s): Oblomov 


1.53 negative binomial random variable 
X is a Negative binomial random variable with parameters r and p if 
fx(x) = (manl =p)", c= {0, 1, s} 


Parameters: 


x r>0 


x p€ [0,1] 


X ~ NegBin(r, p) 


Notes: 


1. IfreN, X the number of failed Bernoulli trials before the rth success. 


Note that if r = 1 the variable is a [geometric random variable, 
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1.54 normal random variable 


X is a Normal random variable with parameters u and o° if 


Parameters: 


x HER 


* oF >0 


X ~ N(u, 0°) 


Notes: 


1. Probably the most frequently used fx(x) will look like a bell-shaped 
hence justifying the synonym bell distribution. 


. When p = 0 and o? = 1 the distribution is called standard 
. The cumulative distribution function] of X is often called (zx). 
EX] = ps 


ao e w y 


Var|X] = o? 
6. Mx(t) = ett? 
Version: 4 Owner: Riemann Author(s): Riemann 


37 


1.55 normalizer of a subset of a group 


normalizer of a subset of a group 


1: A group] 
2: Y C X [subset] 
3: {cE X|cYx1=Y¥} 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: drini Author(s): apmxi 


1.56 nth root 


There are two often-used definitions of the nth root. The first discussed deals with real numbers! 
only; the second deals with 


The nth root of a non-negative real number x, written as </x, can be defined as the real 
number y such that y” = x. This notation is normally, but not always, used when n is a 
This definition could also be written as ¢/z" & «Vx > OER. 


Example: 781 = 3 because 34 = 3 x 3 x 3 x 3 = 81. 


Example: Y5 + 524+ 10x3 + 5r? +1 =x + 1 because (x + 1)? = (x? + 22 +1) (x + 1) = 
x + 5a* + 10x3 + 10x? + 5x +1. (See the [binomial theorem] and Pascal’s Triangle.) 


The nth root operation is [distributive] for multiplication and division, but not for addition 
i 7 n — n m nf/x — Yr . š 
and subtraction. That is, 4/£ X y = Wa x xy, and 7 However, except in special 


cases, VWaty A y2 + vy and VYe-—yF Y2- vy. 


— 3 34 3 
= = because = =a 


af 81 
625 


81 


Example: A 


1 
n 


The nth root notation is actually an alternative to exponentiation. That is, /a = x7. As 
. . . . . . nm 3 
such, the nth root operation is lassociativel with exponentiation. That is, Vz? = z? = YT. 


In this definition, 4/x is undefined when z < 0.and n is even. When n is odd and x < 0, 
/x <0. Examples: y/—1 = —1, but W—1 is undefined for this definition. 


A more generalized definition: The nth roots of a complex number t = x+yi = (x, yi) = (r, 0) 
are all the complex numbers 21, z2,..., Zn € C that the condition z? = t. n such 
complex numbers always exist. 
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One of the more popular methods of finding these roots is through geometry and trigonome- 
try. The complex numbers are treated as a plane using Cartesian |coordinates| with an x axis 
and a yi axis. (Remember, in the context of complex numbers, i  /—I.) These eg uel 
lar coordinates (x, yi) are then translated to polar coordinates (r,@), where r = *%/ax? + y? 
(according to the previous definition of nth root), 0 = 5 if z = 0, and @ = arctan 4 if x Æ 0. 


(See the ) 


Then the nth roots of t are the of a regular polygon having n sides, centered at 
(0,07), and having (r,@) as calculated above as one of its vertices. 


Example: Consider 8. 8 can also be written as 8+0i or in polar as (8,0). By our method, we 
now have an|equilateral triangle|centered at (0,0) and having one vertex at (2,0). Knowing 
that a complete ic consists of 27 radians, and knowing that all angles are equal in an 
equilateral triangle, we can deduce that the other two vertices lie at polar coordinates (2, 2 £) 
and (2, = ). Translating back into rectangular coordinates, we have: 


V8 =2 

V8 = 2(cos = + isin Æ) = 2(-4 +i) = -1 +iv3 

V8 = 2(cos # + isin) = 2(-4 +i j—v3) =-1-iV3 

Example: Consider /—16. We can rewrite this as YZI x W/16 = 2vi. 


We can find 2Vi by using a [formula] for multiplying [complex numbers] in polar coordinates: 
(71,91) X (r2, 02) = (rır2, 91 + 62). So 0 +i = (r?, 20). Therefore, r = Y02+12 = 1 and 
0 = 4. So vi = (1, *), and doubling that we get (2, 4). 


Now we have a{square| centered at polar coordinates (0,0) with one corner at (2,7). Adding 
5 to the angle repeatedly gives us the remainder of the corners: (2, 3), (2, ot), (2, T), 
Translating these to rectangular coordinates works as in the previous example. 


So the four solutions to W/—16 are V2 + iv2, -v2 + ivV2, -v2 — iv2, and V2 — iv2. 


Example: Consider }¥1 +i. As in the previous examples, our first step is to convert 1 + 1i 
into polar coordinates. We get r = v1? +1? = v2 and 0 = arctan 1 = 7, giving a polar 
coordinate of (v2, 4). Now we take the{cube root/of this complex number: (v2, 7) = (r3, 38). 


We get coordinates (v2, £). This point is one vertex of an ay triangle centered at 
(0,0). The other two vertices of the {triangle are derived from adding ?Z = to 8. We know this 
because lines from the |center] of an equilateral triangle to each of the corners will form three 
equal angles of width = about the center, and because all three vertices of an equilateral 
triangle will be the same distance] from the center. 

So the other vertices in polar coordinates are (Y2, 37) and (V2, a ). Most people would 
just use a calculator to compute the and of these angles, but they can be 
interpolated using these handy lidentities} 
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cos 2t = 1 — 2sin’¢ (use this to [calculate] sin(4) from sin(Z) = ¥2) 
sin(a + b) = sin(a) cos(b) + cos(a) sin(b) (use a = # and b = =) 


cos(a + b) = cos(a) cos(b) — sin(a) sin (b) 


The process of calculating these values is left as an exercise to the reader in the interest of 
space. The rectangular coordinates, the cube roots of 1 + i, are: 


EEE EEE N 
C r 


a 4 


17m) _ V2-V6 _ ; 6/5 V24V6 
(92, He) = VI — iyi 


’ 12 


Version: 8 Owner: mathcam Author(s): mathcam, wberry 


1.57 null tree 


A null tree is simply a [tree] with zero nodes) 


Version: 1 Owner: Logan Author(s): Logan 


1.58 open ball 


Let (X, p) be a[metric spaceļand xp € X. Let r be a positive number. The set 


Big.) = {x£ E€ X : plz, £0) <r} 


is called the ball with [center] zo and[radiusir. On some spaces like C or R? this is also known 
as an open disk and when the space is R, it is known as open interval (all three spaces with 


Version: 2 Owner: drini Author(s): drini, apmxi 


1.59 opposite ring 


If R is a[ring} then we may construct the opposite ring R? which has the same underlying 
but with multiplication in the the product of rı and 


rə in RP is rory. 
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If M is a left R-module, then it can be made into a right R°’-module, where a 
element m, when multiplied on the right by an element r of R”, yields the rm that we 
have with our left R-module action on M. Similarly, right R-modules can be made into left 
R°?-modules. 


If Risa then it is equal to its own opposite ring. 


Version: 1 Owner: antizeus Author(s): antizeus 


1.60 orbit-stabilizer theorem 


Given a G on a set X, define Gz to be the orbit! of x and G, to be the set of 
of x. For each x € X the correspondence g(x) > gG, is a between Gz, 
and the set of left cosetsl of G, 


A famous corollary is that 
IGz|-|G,| = |G] Vee X 


Version: 8 Owner: vitriol Author(s): vitriol 


1.61 orthogonal 


The definition of orthogonal varies depending on the mathematical constructs in question. 
There are particular definitions for 


e orthogonal matrices 
e orthogonal polynomials 
e orthogonal vectors 


In general, two objects are orthogonal if they do not “coincide” in some sense. Sometimes 
orthogonal means roughly the same thing as “perpendicular”. 


Version: 2 Owner: akrowne Author(s): akrowne 


1.62 permutation group on a set 
permutation group on a set 
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1: A set 

2: (Sa, o0) Symmetric group 
3: X < Sa 

4: (X, 0) 


fact: conjugating stabilizer of an element by permutation produces 
stabilizer of permuted element 


1: A set 

2:acA 

3: X permutation group on A 
4,0 EX 

5: oStaby(a)o~! = Stabx(a(a)) 


fact: if a permutation group acts transitively, then the intersection 
of conjugated stabilizers is the identity 


1: A set 
2: acA 
3: X permutation group on A 


4: Moex Staby (a) = 1 


Note: This is a “seed” entry written using a short-hand format described in[this FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.63 prime element 


An element p in afring] R is a prime element if it generates a prime ideal] If R is\commutative) 
this is to saying that for all a,b € R , if pldivides|ab, then p divides a or p divides b. 
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When R = Z the prime elements as formulated above are simply prime numbers. 


Version: 3 Owner: dublisk Author(s): dublisk 


1.64 product measure 


Let (E, Bı (E)) and (E>, Bo(E2)) be two [measurable spaces) with measures] u and H2- Let 
Bı x Ba be the on E; x E» generated by subsetslof the form Bı x Bz, where 
Bre Bı (E) and Bo € Bo(E2). 


The product measure pı X u is defined to be the unique measure on the measurable space 
(E, x Ey, Bı x B2) satisfying the [property| 
Hı X uo(Bı X Bo) = [1 (By) 2( Be) for all Bı E B (E), Bo < Bə( Ez). 


Version: 2 Owner: djao Author(s): djao 


1.65 projective line 
projective line 


example 


1: €= {[X,Y,Z,W] €e RP*|Z =W =0} 
Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: bwebste Author(s): apmxi 


1.66 projective plane 
projective plane 


1: ~: S? x S? — {0,1} 


2: Lwy S&y=-r 
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3: p: L > S?/~ 
4: quotient space obtained from p 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 2 Owner: bhaire Author(s): bhaire, apmxi 


1.67 proof of calculus theorem used in the Lagrange 
method 


Let f(x) and gi(x),i = 0,...,m beldifferentiablelscalarlfunctions| x € R”. 


We will find local extremes of the function f(x) where Vf = 0. This can be proved by 
contradiction: 


Vi #0 
< Jeo > 0, Ve; O < € < eq: f(x—eVf) < f(x) < f(x+eVf) 


but then f(x) is not a local extreme. 


Now we put up some conditions, such that we should find the x € S C R” that gives a local 
extreme of f. Let S =()j", Si, and let S; be defined so that g;(x) = 0Yx € Sj. 


Any vector|x € R” can have one|component| perpendicular to the|subset|.5; (for visualization, 
think n = 3 and let S; be alflat| surface). Vg; will be perpendicular to 9;, because: 


Jeo > 0, Ve; 0 < € < €o : gi(x — EV Gi) < gi(x) < gi(x + VQ) 


But gi(x) = 0, so any vector x + «Vg; must be outside S;, and also outside S. (todo: I have 
proved that there might exist a component perpendicular to each subset $;, but not that 
there exists only one; this should be done) 


By the argument above, Vf must be zero - but now we can ignore all components of V f 
perpendicular to S. (todo: this should be expressed more formally and proved) 


So we will have a local extreme within S; if there exists a A; such that 


VÍ = \ÀiVgi 


We will have local extreme(s) within S where there exists a set A;, i = 1,...,m such that 


Vi= SAV G: 


Version: 2 Owner: tobix Author(s): tobix 
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1.68 proof of orbit-stabilizer theorem 


The correspondence is It is [injective] because if gG, = g'G, then g = g'h 
for some h € G,. Therefore g(x) = g’(h(x)) = g'(x). 


Version: 1 Owner: vitriol Author(s): vitriol 


1.69 proof of power rule 


The {power rule can be derived by repeated application of the product rule} 


Proof for all positive integers n 


The power rule has been shown to hold for n = 0 and n = 1. If the power rule is known to 
hold for some k > 0, then we have 


Thus the power rule holds for all positive integers) n. 


Proof for all positive rationals n 


Let y = x?/4, We need to show 
U (Plt) = Egera (1.69.1) 
x q 


The proof of this comes from implicit differentiation 


By definition, we have y? = z”. We now take the derivative) with respect to x on both sides 
of the equality. 


J| o 
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qyr! 

= P Polya 
q 

= P p-1 -P/a—-P 
q 

= P p-1+p/4-p 

q 


= P p/a-1 


q 


Proof for all positive irrationals n 


For positivelirrationals|we claim continuity due to the fact that (69.1) holds for all positive 
and there are positive rationals that approach any positive irrational. 


Proof for negative powers n 


We again employ implicit differentiation. Let u = x, and differentiate u” with respect to x 
for some non-negative n. We must show 


= -nu ™! (1.69.2) 


By definition we have u”u™™” = 1. We begin by taking the derivative with respect to x on 


both sides of the equality. By application of the product rule we get 
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Pen n -n = 1 

5 (ue) 

7 ua? + -n Dn z 0 
o. - pF 


n Du~» 1 
= —nu 
Dz 
D,-2 
—— = —ny 
Ds 


Version: 3 Owner: alek_thiery Author(s): alek_thiery, Logan 


1.70 proof of primitive element theorem 


Let P, € F[z], respectively P, € F|zx], be the satisfied by a, 
respectively b. If E is an [extension] of F that splits P,P,, then E is normal over F, and so 
there are a [finitelnumber of subfields of E containing F, as many as there are of 


Gal(E/F), by the Fundamental Theorem of Galois Theory, Let c, = a+ kb with k € F, and 
consider the fields) F (c). Since F is characteristic|0, there are infinitely many choices for k. 


But F C F(c}) C F(a, b) C E, so by the above there are only finitely many F (c). Therefore, 
for some k;, kj € F, F(ck;) = F(ck,). Then ck; € F(ck;), and so cy, — Ck, = (ki—ky)b E€ F (Ck,), 
and thus b € F (cp). Then also a = cy, — kib € F(ck,), which gives F (a,b) C F (ck). But we 
also have F'(ck,) C F (a,b), and thus F(a, b) = F (ck,), QED 


Version: 1 Owner: sucrose Author(s): sucrose 


1.71 proof of product rule 


D foa = p Let Hole t = loe 
z j A EN E E E 
h—0 h 


f@+h)— f(x) 


g(a +h) - g(a) 
a e 


= lim | fe +h) 
= f(x)g'(x) + f'(x)g(2) 
Version: 1 Owner: Logan Author(s): Logan 
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1.72 proof of sum rule 


m £2 +h) + He +h) — F(a) - gla) 


D 

Ze +9(2)) = jim : 
= n (Leta). glæ++h)-— glx) 
= 7 g h 
= f'(x) + g(x) 


Version: 1 Owner: Logan Author(s): Logan 


1.73 proof that countable unions are countable 


Let C be alcountablelcollectionl of countable sets. We will show that UC is countable. 


Let P be the set of positive primes, P is countably infinite, so there is a between 
P and N. Since there is a bijection between C and a subset] of N, there must in turn be a 


one-to-one) function) f : C > P. 


Each S € C is countable, so there exists a bijection between S and some subset of N. Call 
this function g, and define a new function hg : S — N such that for all z € S, 


hs(x) = f(S)" 


Note that hs is one-to-one. Also note that for any distinct pair S,T € C, the {range of hs 
and the range of hr are [disjoint] due to the fundamental theorem of arithmetic 


We may now define a one-to-one function h : JC — N, where, for each x € UC, h(x) = 
hs(x) for some S € C where x € S (the choice of S is irrelevant, so long as it [contains] x). 
Since the range of h is a subset of N, h is a bijection into that set and hence |J C is countable. 


Version: 2 Owner: vampyr Author(s): vampyr 


1.74 quadrature 


Quadrature is the computation of a univariate definite integral, It can refer to either 
numerical or [analytic] techniques; one must gather from context which is meant. 


Cubature refers to higher-dimensional definite integral computation. 


Some numerical quadrature methods are the/trapezoidal rule, and/Riemann sums} 


Version: 4 Owner: akrowne Author(s): akrowne 
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1.75 quotient module 


quotient module 


1: X is alring) 

2: Y almodule over X 

3: Z is a submodule of Y 

4: Y/Z is the ladditive group) of cosets) of Z in Y 
5: z(y + Z) = zy + Z module [structurel] 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: Thomas Heye Author(s): apmxi 


1.76 regular expression 


A regular expression is a particular metasyntax for specifying which 


has many useful applications. 


While variations abound, fundamentally a regular expression consists of the following components} 
Parentheses can be used for grouping and nesting, and must a fully-formed regular 
expression. The | symbol can be used for denoting alternatives. Some [specifications] do not 
provide nesting or alternatives. There are also a number of postfix{operators| The ? operator 
means that the preceding element can either be present or non-present, and corresponds to a 
rule of the form A — B | A. The * operator means that the preceding element can be present 
zero or more times, and corresponds to a rule of the form A — BA|X. The + operator 
means that the preceding element can be present one or more times, and corresponds to a 
rule of the form A — BA|B. Note that while these rules are not immediately in 
form, they can be transformed so that they are. 


Here is an example of a regular expression that specifies a grammar that generates the/binary) 
of all multiples of 3 (and only multiples of 3). 


(0*(1(01*0)*1)*)*0* 


This specifies the (in BNF): 
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::= AB 
= CD 
::= OBA 
::= OC|A 
ee 1E1 
::= FE|X 
::= 0GO 
::= 1GJà 


QyhoQwea wF 
i 


A little further work is required to transform this grammar into an acceptable form for 
regular grammars, but it can be shown that this grammar (and any grammar specified by a 


regular expression) is equivalent] to some regular grammar. 


Regular expressions have many applications. Quite often they are used for powerful 
and substitution features in many text editors and programming 


Version: 1 Owner: Logan Author(s): Logan 


1.77 regular language 


A regular grammar is a context-free grammar) where all productions must take one of the 
following forms (specified here in BNF, A is the empty string): 


<non-terminal> ::= terminal 
<non-terminal> := terminal non-terminal 


A 


<non-terminab> 


A regular language is the set of generated by a regular grammar. Regular grammars 
are also known as Type-3 grammars in the Chomsky hierarchy. 


A regular grammar can be represented by a deterministic or{non-deterministic finite automaton} 
Such automata can serve to either generate or accept in a particular regular 
language. Note that since the set of regular languages is a subset) of context-free lan- 
guages, any deterministic or non-deterministic finite automaton can be simulated by a 


pushdown automaton 


Version: 2 Owner: Logan Author(s): Logan 
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1.78 right function notation 


We are said to be using right function notation if we write [functions to the right of their 
arguments. That is, if a: X — Y is a function and x € X, then «a is the {image|of x under 
a. 


Furthermore, if we have a function 8 : Y — Z, then we write the composition of the two 
functions as a : X — Z, and the image of x under the composition as ra = x(aZ) = 


(xa). 
Compare this to left function notation 


Version: 1 Owner: antizeus Author(s): antizeus 


1.79 ring homomorphism 


Let R and S$ be|rings| A ring homomorphism is a function) f : R — S such that: 


e f(a+b) = f(a)+ f(b) for all a,b € R 
e f(a-b)= f(a): f(b) for all a,b E€ R 


When working in a context in which all rings have a multiplicative identity, one also requires 
that f(1R) = Is. 


Version: 3 Owner: djao Author(s): djao 


1.80 scalar 


A scalar is a quantity that is under also known as a 
ltensorloflrankl0. For example, the number 1 is a scalar, so is any number or variable n € R. 
The point (3,4) is not a scalar because it is variable under rotation. As such, a scalar can 


be an element of a[field over which a|vector space] is defined. 
Version: 3 Owner: slider142 Author(s): slider142 


1.81 schrodinger operator 
schrodinger operator 
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1:V:R—-—R 


2: yr -7 +V(x)y 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 1 Owner: bwebste Author(s): apmxi 


1.82 selection sort 


The Problem 


See the Sorting Problem. 


The Algorithm 


Suppose L = {z£1, £2,..., Zn} is the initial list of unsorted elements. The selection sort 
algorithm sorts this list in n steps. At each step i, find the largest element L[j] such that 
j<n—i+1, and swap it with the element at L[n — i + 1]. So, for the first step, find the 
largest value in the list and swap it with the last element in the list. For the second step, 
find the largest value in the list up to (but not including) the last element, and swap it with 
the next to last element. This is continued for n — 1 steps. Thus the selection sort algorithm 
is a very simple, in-place sorting algorithm. 


Pseudocode 


Algorithm SELECTION_SORT(L, n) 
Input: A list L of n elements 
Output: The list L in sorted order begin 
for i — n,downto 2 do 
egin 
temp — Lli] 
max — 1 
for j — 2 to i do 
if L[j] > L[mazx] then 
Max — j 
Li] — L[maz] 
L|maz| — temp 
end 
end 
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Analysis 


The selection sort algorithm has the same runtime for any set of n elements, no matter 
what the values or lorder| of those elements are. Finding the maximum element of a list of 
i elements requires i — 1 comparisons. Thus T(n), the number of comparisons required to 
a list of n elements with the selection sort, can be found: 


T(n) = di=1) 


i=2 
= X i-n-2 
i=1 
_ (W-n-A) 
E 2 
= O(n?) 


However, the number of data movements is the number of swaps required, which is n—1. This 
algorithm is very similar to the insertion sort) algorithm. It requires fewer data movements, 
but requires more comparisons. 


Version: 1 Owner: Logan Author(s): Logan 


1.83 semiring 


A semiring is an (A,-,+,0,1) of a set A, where 0 and 1 are [constants] (A,-,1) is 
a [monoid] (A,+,0) is a commutative monoid) - distributes over + from the left and right, 


and 0 is both a left and right annihilator (0a = a0 = 0). Often a - b is written simply as ab, 
and the semiring (A,-,+,0,1) as simply A. 


The < on a semiring A is defined as a < b if and only if there exists some c € A 
such that a + c = b, and is a quasiordering. If + is idempotent over A (that is, a+a=a 
holds for all a € A), then < is a partial {ordering} 


Addition and (left and right) multiplication are with respect to <, with 
0 as the 


Version: 2 Owner: Logan Author(s): Logan 
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1.84 simple function 


Let (X,%8) bea Let ya,, k = 1,2,...,n be the characteristic functions 


of sets A; € B. We call h a simple function if it can be written as 


æ 


(1.84.1) 


n 
h = > CkX Ako Ck E 
k=1 


for some n € N. 
Version: 2 Owner: drummond Author(s): drummond 
1.85 simple path 


A simple path in a is a [path] that [contains] no |vertex| more than once. By definition, 
icycles| are particular instances of simple paths. 


Version: 1 Owner: Logan Author(s): Logan 


1.86 solutions of an equation 
solutions of an equation 


1: {z | f(x) =0} 


Note: This is a “seed” entry written using a short-hand format described in|this FAQ, 
Version: 1 Owner: Thomas Heye Author(s): apmxi 
1.87 spanning tree 


A spanning tree of a (connected) G is a connected, of G that 
lcontains! all of the|vertices| of G. Below is an example of a spanning tree T, where the [edges] 


in T are drawn as solid lines and the edges in G but not in T are drawn as dotted lines. 
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For any {tree there is exactly one spanning tree: the tree itself. 


Version: 2 Owner: Logan Author(s): Logan 


1.88 square root 

The square root of a non-negative real number) x, written as yz, is the real number y such 
that y? = x. Equivalently, Ja Se, Or, YTX yT Sew. 

Example: /9 = 3 because 3? = 3 x 3 = 9. 

Example: Vz? + 2x +1 = g+1 because (x+1)? = (x+1)(a+1) = 2? +2+241 = 2742241. 


In some situations it is better to allow two values for yx. For example, V4 = +2 because 
2? = 4 and (—2)? = 4. 


The square root operation is|distributive for multiplication and division, but not for addition 
and subtraction. 


That is, y£ X y = y£ x \/y, and J= 
However, in general, y£ Fy #4 væ + yy and y£ —y 4 y2- y. 


Example: ,/z2y? = xy because (ry)? = £y x ry =T X £X YX Y=? X y? = ry. 


s 9 _ 3 3\2 _ 32 _ 9 
Example: = =F because (2) = f= 5: 
The square root notation is actually an alternative to exponentiation. That is, y£ <= v2. As 


Nak = . nae l 3 3 
such, the square root operation islassociative with exponentiation. That is, V£? = 72 = yr. 


Negative real numbers do not have real square roots. For example, v —4 is not a real number. 
Proof by contradiction: Suppose /—4 = x € R. If x is negative, x? is positive. But if x is 
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positive, x? is also positive. But x cannot be zero either, because 0? = 0. So /—4 £ R. 


For additional discussion of the square root and negative numbers, see the discussion of 


Numbers. 


Version: 9 Owner: wberry Author(s): wberry 


1.89 stable sorting algorithm 


A stable sorting algorithm is any sorting algorithm that preserves the relative jordering| of 


items with equal values. For instance, consider a list of ordered pairs|L := {(A, 3), (B, 5), (C,2), (D, 5), (E,4 
If a stable sorting algorithm [sorts] L on the second value in each pair using the < 


then the result is guaranteed to be {(C, 2), (A,3), (£,4),(B,5),(D,5)}. However, if an 
algorithm is not stable, then it is possible that (D,5) may come before (B,5) in the sorted 
output. 


Some examples of stable sorting algorithms are and mergesort (although the 
stability of mergesort is dependent upon how it is implemented). Some examples of unstable 


sorting algorithms are and (quicksort could be made stable, but then it 
wouldn’t be quick any more). Stability is a useful [property| when the total ordering relation 


is dependent upon initial position. Using a stable sorting algorithm means that sorting by 
ascending position for equal keys is built-in, and need not be implemented explicitly in the 


comparison [operator] 
Version: 3 Owner: Logan Author(s): Logan 
1.90 standard deviation 


Given a|random variable X, the standard deviation of X is defined as 


SD|[X] = /Var[X]. 


The standard deviation is a measure of the variation of X around the expected value 


Version: 1 Owner: Riemann Author(s): Riemann 


1.91 stochastic independence 


The random variables X,, Xo, ..., Xn are stochastically independent (or just independent) 
if 
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fxi, Xn (21, aa En) = fx (z1) Ton fxaltn) V(x, si Ea) ER” 


This is, the random variables X4, ..., Xn are independent if its can 
be expressed as the product of the of the variables, evaluated at the 


corresponding points. 


This definition implies all the following: 


L Psy, Xa (81 ee En) = Fx, (1) +++ fxn (@n)V(41, +) Un) € R” (joint cumulative distribution) 
2. Mx, +...4x,(t) = Mx, (t) >>> Mx, (V(t, tn) (moment generating function) 
3. Ell Tins Xj] = Li E[X;] (expectation) 


However, only the first two above imply independence. See also [correlation] 
There are other definitions of independence, too. 


Version: 3 Owner: Riemann Author(s): Riemann 


1.92 substring 


Given a s € X*, a string t is a substring of s if s = utv for some strings u,v € X*. 
For example, lp, al, ha, alpha, and \ (thelempty string) are all substrings of the string alpha. 


Version: 2 Owner: Logan Author(s): Logan 
1.93 successor 
Given a set S, the successor of S is the set S| J{S}. One often denotes the successor of S 


by S. 


Version: 1 Owner: djao Author(s): djao 
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1.94 sum rule 


The sum rule |states| that 


Proof 


See the proof of the sum rule; 


Examples 


D D D 

3 1) = —r+ fe 

p Sp n 
D, D, D D 
~p ai e a =s = E 
DC 3a + 2) D,” + 5 Sn) +) r-—3 
D D 
p (sine + cos z) = pms + j cose = cosg — sing 


Version: 3 Owner: Logan Author(s): Logan 


1.95 superset 


Given two sets A and B, A is a superset of B if every element in B is also in A. We 


denote this {relation|as A D B. This is to saying that B is a|subset! of A, that is 


ADBS=BCA. 


Similar rules that hold for C also hold for D. If X DY and Y D X, then X = Y. Every set 


is a superset of itself, and every set is a superset of the empty set 


A is a proper superset of B if A D B and A # B. This relation is often denoted as A D B. 
Unfortunately, A > B is often used to mean the more general superset relation, and thus it 


should be made explicit when proper superset is intended. 


Version: 2 Owner: Logan Author(s): Logan 
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1.96 symmetric polynomial 


A [polynomial f € R{x1,..., £n] in n variables with coefficients in a [ring] R is if 
o(f) = f for every o of the set {21,...,2n}. 


Every symmetric polynomial can be written as a polynomial expression in theelementary symmetric polyno 


Version: 2 Owner: djao Author(s): djao 


1.97 the argument principle 


the argument principle 


L: fmeromorphiclin 9 
2 VO <Si<Sn: fia) =0 
3: VO Si <m: f(bi) =œ 
4: q [cycle 


5: y homologous to zero with respect to Q 
6: Va; ¢ im(7) : Yb; ¢ im(7) : zint, f'(2)/f(2)dz = Zio ind, (aj) — Eio ind, (by) 
Note: This is a “seed” entry written using a short-hand format described in|this FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.98 torsion-free module 


torsion-free module 


L: fntegral domain 
2: X left module over R 
3: X; torsion submodule] 
4: X= 0 


59 


fact: a finitely generated torsion-free submodule is a free module 


1: X finitely generated R-module 
2: X [torsion-free] 
3: X [free] 


(to be fiexd) 
Note: This is a “seed” entry written using a short-hand format described in AQ 


Version: 2 Owner: drini Author(s): drini, apmxi 


1.99 total order 


A total order is a special case of a {partial order] If < is a partial order on A, then it 
lsatisfies| the following three 


1. a<aforalaeA 
2. antisymmetry! If a < b and b <a for any a,b € A, then a = b 
3. transitivity! If a < b and b < c for any a,b,c € A, then a <S c 


The < is a total order if it satisfies the above three properties and the following 
additional property: 


4. Comparability: For any a,b € A, either a < borb <a. 


Version: 2 Owner: Logan Author(s): Logan 


1.100 tree traversals 


A tree traversal is an algorithm for visiting all the in a [rooted tree exactly once. 
The constraint is on rooted trees, because the [root] is taken to be the starting point of the 
traversal. A traversal is also defined on a/forest|in the sense that each Itree|in the forest can 
be iteratively traversed (provided one knows the roots of every tree beforehand). This entry 
presents a few common and [simple] tree traversals. 
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In the description of a{tree, the notion of rooted-subtrees was presented. Full understanding 
of this notion is necessary to understand the traversals presented here, as each of these 
traversals depends heavily upon this notion. 


In a traversal, there is the notion of visiting a node. Visiting a node often consists of doing 
some computation with that node. The traversals are defined here without any notion of 
what is being done to visit a node, and simply indicate where the visit occurs (and most 


importantly, in what lorder). 
Examples of each traversal will be illustrated on the following binary tree] 


S\ SS 


Vertices will be numbered in the order they are visited, and edges will be drawn with arrows 
indicating the path of the traversal. 


Preorder Traversal 


Given a rooted tree, a preorder traversal consists of first visiting the root, and then 
executing a preorder traversal on each of the root’s children (if any). 


For example 


UN VAN 


The term preorder refers to the fact that a node is visited before any of its descendents. 
A preorder traversal is defined for any rooted tree. As pseudocode, the preorder traversal is 


Algorithm PREORDERTRAVERSAL(2, VISIT) 
Input: A node «x of a binary tree, with children left(x) and right(x), and some computation 
Visit 
d 
for x 
Output: Visits nodes of subtree rooted at x in a preorder traversal begin 
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efined 


Visit 
(x) 

Preorder Traversal 
(left (x), VISIT) 

Preorder Traversal 
(right(x), VISIT) 


en 


Postorder Traversal 


Given a rooted tree, a postorder traversal consists of first executing a postorder traversal 
on each of the root’s children (if any), and then visiting the root. 


For example 


AN VN 


As with the preorder traversal, the term postorder here refers to the fact that a node is 
visited after all of its descendents. A postorder traversal is defined for any rooted tree. As 
pseudocode, the postorder traversal is 


Algorithm POSTORDERTRAVERSAL(x2, VISIT) 
Input: A node x of a binary tree, with children left(x) and right(x), and some computation 
Visit 
d 
for x 
Output: Visits nodes of subtree rooted at x in a postorder traversal begin 
Visit 
(x) 
Postorder Traversal 
(left (x), VISIT) 
Postorder Traversal 
(right(x), VISIT) 


en 
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efined 


In-order Traversal 


Given a binary tree, an in-order traversal consists of executing an in-order traversal on 
the root’s left child (if present), then visiting the root, then executing an in-order traversal 
on the root’s right child (if present). Thus all of a root’s left descendents are visited before 
the root, and the root is visited before any of its right descendents. 


For example 


I 


PN i 
ON ZN 


As can be seen, the in-order traversal has the wonderful property of traversing a tree from 
left to right (if the tree is visualized as it has been drawn here). The term in-order comes 
from the fact that an in-order traversal of a binary search tree visits the data associated with 
the nodes in sorted order. As pseudocode, the in-order traversal is 


Algorithm INORDERTRAVERSAL(2, VISIT) 
Input: A node «x of a binary tree, with children left(x) and right(x), and some computation 
Visit 


d efined 
for x 
Output: Visits nodes of subtree rooted at x in an in-order traversal begin 
InOrder Traversal 
(left (a), VISIT) 
Visit 
(x) 
InOrder Traversal 
(right(2), VISIT) 
en 


Version: 3 Owner: Logan Author(s): Logan 


1.101 trie 


A trie is a for storing a set of|strings in which there is one node for every prefix 


of every string in the set. The mamei comes from the word retrieval, and thus is pronounced 
the same as|tree] (which leads to much confusion when spoken aloud). The word retrieval is 
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stressed, because a trie has a lookup time that is [equivalent] to the length of the string being 
looked up. 


If a trie is to store some set of strings S C %* (where © is an alphabet), then it takes the 
following form. Each leading to non-leaf nodes in the trie is labelled by an element 
of ©. Any edge leading to a leaf nodel is labelled by $ (some symbol not in ©). For every 
string s € S, there is a [path] from the [root] of the trie to a leaf, the labels of which when 
concatenated form s + $ (where + is the string concatenation operator). For every path 
from the root of the trie to a leaf, the labels of the edges concatenated form some string in 


S. 


Example 


Suppose we wish to store the set of strings S := {alpha, beta, bear, beast, beat}. The trie that 
stores © would be 


UN. 
VA xX, 
Pas 


1 ZINN, 
a a a 


Version: 4 Owner: Logan Author(s): Logan 


1.102 unit vector 


A unit vector is alvector| with a length, or[vector norm] of one. In R”, one can obtain such a 

vector by dividing a vector by its magnitude |v]. For example, we have a vector < 1,2,3 >. 
A unit vector pointing in this direction would be 

1 1 1 

[<1,2,3>] <1,2,3 >= — <1,2,3>=< —,—, 


J/14 14 
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N 


. The magnitude of this vector is 1. 


Version: 7 Owner: slider142 Author(s): slider142 


1.103 unstable fixed point 
A [fixed point|is considered unstable if it is neither attracting nor A \saddle| 
point is an example of such a fixed point. 


Version: 1 Owner: armbrusterb Author(s): armbrusterb 


1.104 weak* convergence in normed linear space 


weak* convergence in normed linear space 


1: (x) cX’ 
2: X a|Banach space 


Jx’ € X' : Yx € Xl, (x4) © x(x) = ee). 


oa 
u 


4: If X islreflexive, then weak* convergence is the same as weak convergence 


Note: This is a “seed” entry written using a short-hand format described inthis FAQ, 


Version: 1 Owner: bwebste Author(s): apmxi 


1.105 well-ordering principle for natural numbers 


Every nonempty set S of nonnegative alleast element} that is, there is some 
integer a in S such that a < b for all b belonging to S. 


For example, the positive integers are a|lwell-ordered set) under the standard [order] 


Version: 5 Owner: KimJ Author(s): KimJ 
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Chapter 2 


00-01 — Instructional exposition 
(textbooks, tutorial papers, etc.) 


2.1 dimension 


The word dimension in mathematics has many definitions, but all of them are trying to 
quantify our intuition that, for example, a sheet of paper has somehow one less dimension 
than a stack of papers. 


One common way to define dimension is through some notion of a number of [independent] 
quantities needed to describe an element of an object. For example, it is to say 
that the sheet of paper is two-dimensional because one needs two [real numbers! to specify 
a position on the sheet, whereas the stack of papers is three-dimension because a position 
in a stack is specified by a sheet and a position on the sheet. Following this notion, in 
linear algebra] the [dimension] of a {vector space is defined as the minimal number of 
such that every other vector in the vector space is representable as a sum of these. Similarly, 


the word|rank denotes various dimension-like linvariants] that appear throughout the/algebra) 


However, if we try to generalize this notion to the mathematical objects that do not possess 


an then we run into a difficulty. From the point of view of \set theory 


there are {as many) real numbers as pairs of real numbers since there is a (bijection from real 
numbers to pairs of real numbers. To distinguish a plane from a cube one needs to impose 


restrictions on the kind of Surprisingly, it turns out that the continuity is not 
enough as was pointed out by Peano. There are that map] a square) 
onto a cube. So, in [topology] one uses another intuitive notion that in a high-dimensional 
space there are more directions than in a low-dimensional. Hence, the (Lebesgue covering) 
dimension of a topological space is defined as the smallest number d such that every covering 
of the space by open sets|can be refined so that no point is contained in more than d+ 1 sets. 
For example, no matter how one a sheet of paper by sufficiently small other sheets 
of paper such that no two sheets can overlap each other, but cannot merely touch, one will 
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always find a point that is covered by 2+ 1 = 3 sheets. 


Another definition of dimension rests on the idea that higher-dimensional objects are in some 
sense larger than the lower-dimensional ones. For example, to cover a cube with a side length 
2 one needs at least 23 = 8 cubes with a side length 1, but a square with a side length 2 can 
be covered by only 2? = 4 [unit] squares. Let N(e) be the minimal number of open balls) in 
any covering of a{bounded] set S by [balls] of |radius) e. The Besicovitch-Hausdorff dimension 
of S is defined as — lim... log, N (€). The Besicovitch-Hausdorff dimension is not always 
defined, and when defined it might be non-integral. 


Version: 4 Owner: bbukh Author(s): bbukh 


2.2 toy theorem 


A toy theorem is a simplified version of a more general theorem. For instance, by intro- 
ducing some simplifying assumptions in a theorem, one obtains a toy theorem. 


Usually, a toy theorem is used to illustrate the claim of a theorem. It can also be illustrative 
and insightful to study proofs of a toy theorem derived from a non-trivial theorem. Toy 
theorems also have a great education value. After presenting a theorem (with, say, a highly 
non-trivial proof), one can sometimes give some assurance that the theorem really holds, by 
proving a toy version of the theorem. 


For instance, a toy theorem of |Brouwer fixed point theorem is obtained by restricting the 
ldimension|to one. In this case, the Brouwer fixed point theorem follows almost immediately 


from the (see [this page). 


Version: 1 Owner: matte Author(s): matte 
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Chapter 3 


OO-X X — General 


3.1 method of exhaustion 


The method of exhaustion is calculating an area by approximating it by the areas of a 
sequence} of polygons 


For example, filling up the [interior] of a circle] by inscribing polygons with more and more 
sides. 


Version: 1 Owner: vladm Author(s): vladm 
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Chapter 4 


00A05 — General mathematics 


4.1 Conway’s chained arrow notation 


Conway’s chained arrow notation is a way of writing numbers even larger than those 


provided by the We define m —> n — p = m?+2)n = mf --- Tn and 
—— 
Pp 


m—>n =m —> n — 1= m”. Longer chains are evaluated by 


m—>:->n—>p>l=m—>.->n>p 
m=—>:-—>n—>l—=>q=sm>.: ->n 
and 
m=- —>n>p+l>q+1=5m>--—>n—>(m=>-->n>p>q+1)>q 
For example: 
3—3 —2= 


3 (8 —2—2)>1= 

33 (8 92-2) = 

3 (8 — (83-152) >1)= 
3— (3—3 ->1)= 

3) = 


377 = 7625597484987 
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A much larger example is: 


3727-4545 

3—2 — (38-2534) 3 3= 

3—2 — (3 -2- (3 - 2 — 2 — 4) > 3) > 3 = 

3—2 — (3 — 2 — (8 — 2 — (8 — 2—1 —4)> 3) > 3) > 3 = 
3 > 2 — (3 — 2 > (3 = 2 > (3 — 2) > 3) > 3) > 3 = 

3—2 — (3—2 > (3 - 2-9 > 3) = 3) —3 


Clearly this is going to be a very large number. Note that, as large as it is, it is proceeding 
towards an eventual final evaluation, as evidenced by the fact that the final number in the 
chain is getting smaller. 


Version: 4 Owner: Henry Author(s): Henry 


4.2 Knuth’s up arrow notation 


Knuth’s up arrow noation is a way of writing numbers which would be unwieldy in 
standard decimal notation. It expands on the [exponential] notation m f n = m”. Define 
mit O0=1 and m Ti n=—mT (m ff [n -— 1). 


Obviously) m Tf 1 = m! = m, so 3 TT 2 = 331! = 33 = 27, but 2 TT 3 = 27712 = 2? = 
2°) = 16. 


In general, m fT n = mre a tower of height] n. 
Clearly, this process can be extended: m ttt 0 = 1 and m fff n =m ff (m TTT [n — 1]). 


An alternate notation is to write mn for m f --- T n. (i—2 times because then m?n = m-n 
—— 


i—2 times 


and mn =m +n.) Then in general we can define mn = m°-)(m(n — 1)). 


To get a sense of how quickly these numbers grow, 3 TT? 2 = 3 TT 3 is more than seven and 
a half trillion, and the numbers continue to grow much more than exponentially. 


Version: 3 Owner: Henry Author(s): Henry 


4.3 arithmetic progression 


Arithmetic progression of length n, initial {term| a, and common (difference d is the Sequence] 
ay, a, + d,a, + 2d,...,a, + (n — 1)d. 
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S= (ay + 0) 
The sum of terms of an arithmetic progression can be computed using Gauss’s trick: +5 = (a, +(n—1) 


2S = (2a, + (n — 1 


We just add the sum with itself written backwards, and the sum of each of the{columns|equals 
to (2a; + (n —1)d). The sum is then 


(2a, + (n — I)d)n 


a 2 


Version: 3 Owner: bbukh Author(s): bbukh 


4.4 arity 


The arity of something is the number of arguments it takes. This is usually applied to 
an n-ary function is one that takes n arguments. Unary is a synonym for l-ary, 
and binary for 2-ary. 


Version: 1 Owner: Henry Author(s): Henry 


4.5 introducing Oth power 


Let a be a number. Then for all n € N, a” is the product of n a’s. For integers) (and their 
we have a “multiplicative identity” called “1”, i.e.a -1 = a for all a. So we can 
write 


From the definition of the power] of a the usual laws can be derived; so it is plausible to set 
a? = 1, since 0 doesn’t change a sum, like 1 doesn’t change the product. 


Version: 4 Owner: Thomas Heye Author(s): Thomas Heye 


4.6 lemma 


There is no technical distinction between a lemma and a theorem. A lemma is a proven 
statement, typically named a lemma to distinguish it as a truth used as a stepping stone to 
a larger result rather than an important statement in and of itself. Of course, some of the 
most powerful statements in mathematics are known as lemmas, including 
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etc., so one|clearly| can’t get too much sim- 
ply by reading into a proposition’s fame} 


According to [I], the plural Lemmas’ is commonly used. The correct plural of lemma, 
however, is lemmata. 


REFERENCES 


1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and 
Applied Mathematics, 1998. (pp. 16) 


Version: 5 Owner: mathcam Author(s): mathcam 


4.7 property 


Given each element of a set X, a property is either true or false. Formally, a property P: X —> 
{true, false}. Any property gives rise in alnaturall way to the set {x : x has the property P} 
and the corresponding 


Version: 3 Owner: fibonaci Author(s): bbukh, fibonaci, apmxi 


4.8 saddle point approximation 


The saddle point approximation (SPA), a.k.a. phase approximation, is a widely 
used method in quantum field theory (QFT) and related fields. Suppose we want to evaluate 


the following [integral] in the {limit) C — oo: 
J= L int? dg eWSF@), (4.8.1) 


—0o 


The saddle point approximation can be applied if the f(x) [satisfies] certain condi- 
tions. Assume that f(x) has a|global minimum] f (xo) = Ymin at £ = Xo, which is sufficiently 
separated) from other {local minima) and whose value is sufficiently smaller than the value of 
those. Consider the Taylor expansion of f(x) about the point Zo: 


Ce so A). aen toei (4.8.2) 


xr=xo 


f (©) = f(z0) + -f (a) 


L=2XO 


Since f(xo) is a (global) minimum, it is clear that f'(xọ) = 0. Therefore f(x) may be 
approximated to quadratic as 


Fe)  F(0) + SF" ao) E — 20)? (4.8.3) 
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The above assumptions on the minima of f(x) ensure that the contribution to 
(48.1) in the limit Ç — oo will come from the [region] of integration around zo: 


J lim int” da e7 $f" (#0)(@—20)” (4.8.4) 


¢—00 


On 1/2 
~~ lim e~S/ (0) (=) . 
(00 Cf" (x0) 


In the last step we have performed the Gaufian integral. The next nonvanishing higher order 
correction to (4.8.4) stems from the quartic term] of the expansion (48.2). This correction 


may be incorporated into (48.4) to yield (after expanding part of the exponential): 
IZ ae e SFo int dx e734" (#0) (@— 20)” (1 — § (ALF (2))oano( — z)') : (4.8.5) 


...to be continued with applications to physics... 


Version: 2 Owner: msihl Author(s): msihl 


4.9 singleton 


A set consisting of a single element is usually referred to as a singleton. 


Version: 2 Owner: Koro Author(s): Koro 


4.10 subsequence 


If X is a set and (an)nen is a Sequence) in X, then a subsequence of (ap) is a sequence of 
the form (a,,.)ren where (n,),en is a strictly increasing] sequence of 


Version: 2 Owner: Evandar Author(s): Evandar 


4.11 surreal number 


The surreal numbers are a generalization of the {reals| Each surreal number consists of two 
parts (called the left and right), each of which is a set of surreal numbers. For any surreal 
number NV, these parts can be called Nz and Nr. (This could be viewed as anjordered pair of 
sets, however the surreal numbers were intended to be a/basis for mathematics, not something 
to be embedded in|set theory.) A surreal number is written N = (Nz | Nr). 


Not every number of this form is a surreal number. The surreal numbers satisfy) two addi- 
tional First, if v € Np and y € N; then x £ y. Secondly, they must be well 
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founded. These properties are both satisfied by the following construction of the surreal 


numbers and the < relation by mutual 
(|), which has both left and right parts empty, is 0. 


Given two (possibly empty) sets of surreal numbers R and L such that for any x € R and 
yel a£ yL |R) 


Define N < M if there is no x € Nz such that M < x and no y € Mp such that y < N. 


This process can be continued transfinitely, to define linfinitel and infinitesimal) numbers. For 
instance if Z is the set oflintegers|then w = (Z |). Note that this does not make equality the 


same as identity: (1 | 1) = (|), for instance. 


It can be shown that N is sandwiched” between the elements of Nz and Np: it is larger 
than any element of N; and smaller than any element of Np. 


Addition of surreal numbers is defined by 


N+M=({N+a|ceEM,}| {M+e|yeN,}|{N+2| ce Mg}|){Mt+2|y € Nr} 


It follows that -N = (—Np | —Nrz). 


The definition of multiplication can be written more easily by defining M - Ne = {M -< | 
x € Nz} and similarly for Np. 


Then 


N-M=(M-N,+N-M,—Nz,-Mt,M-Nre+N-Mr— Ne: Me | 
M-Nr,+N-Mr—N,-Mr,M-Nre+N- Mz — Nr: Mz) 


The surreal numbers satisfy thelaxioms|for a under addition and multiplication (whether 
they really are a field is complicated by the fact that they are too large to be a set). 


The integers of surreal mathematics are called the omnific integers. In general positive 
integers n can always be written (n — 1 |) and so —n = (| 1 — n) = (| (—n) +1). So for 
instance 1 = (0 |). 


In general, (a | b} is the simplest number between a and b. This can be easily used to define 
the dyadic {fractions} for any integer a, a + į = (a | a + 1). Then $= (0 | 1), $= (0 | £), 
and so on. This can then be used to locate non-dyadic fractions by pinning them between 
a left part which gets infinitely close from below and a right part which gets infinitely close 
from above. 


74 


can be defined starting with w as defined above and adding numbers 


such as (w |) = w+ 1 and so on. Similarly, a starting infinitesimal can be found as (0 | 


1,3,4---) = 4, and again more can be developed from there. 
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Chapter 5 


00A07 — Problem books 


5.1 Nesbitt’s inequality 


Nesbitt’s inequality says, that for positive [real] a, b and c we have: 
b 
a Ce 3 


bee Ge rA a 
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5.2 proof of Nesbitt’s inequality 


Starting from Nesbitt’s inequality 


a b c 
+ + 
b+c a+c a+b 


a 


we transform the left hand side: 


a+b+c a+b+c atb+c _ 


3>3. 
b+c a+c a+b 2 


Now this can be transformed into: 


((a +0) + (a+c) + (+0) (S++) >9 


Division by 3 and the right [factor] yields: 
(a+b)+(a+c)+ (b+c) 3 
3 > Ard ee 
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Now on the left we have the larithmetic mean] and on the right the so this 
is true. 
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Chapter 6 


00A20 — Dictionaries and other 
general reference works 


6.1 completing the square 


Let us consider the expression x? + xy, where x and y are [real (or complex) numbers. Using 
the 
(pty) = 2° + day ty? 


we can write 


xr’ +ry = x +a2y+0 
4 4 
2 
— (r 
= (ž r T 


This manipulation is called completing the square [3] in x*+ ry, or completing the square 


x. 


Replacing y by —y, we also have 
2 yY 
s — ty = e T 


Here are some applications of this method: 


e Derivation of the solution formula to the quadratic equation 


e Completing the square can also be used to find the extremal value of a quadratic 
[2] without calculus. Let us illustrate this for the polynomial p(x) = 
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4x? + 8x + 9. Completing the square yields 


p(x) = (22+ 2)?-4+9 
= (2r+2)?+5 
> 5, 


since (2x + 2)? > 0. Here, equality holds if and only if x = —1. Thus p(x) > 5 for all 


x € R, and p(x) = 5 if and only if z = —1. It follows that p(x) has al\global minimum 
at x = —1, where p(—1) = 5. 


e Completing the square can also be used as an integration technique to integrate, say 


1 
4z? +8149 B] i 
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Chapter 7 


00A99 — Miscellaneous topics 


7.1 QED 


The “QED” is actually an abbreviation and stands for the Latin quod erat demon- 
strandum, meaning “which was to be demonstrated.” 


QED typically is used to signify the end of a mathematical proof. The symbol 


is often used in place of “QED,” and is called the “Halmos symbol” after mathematician 
Paul Halmos (it can vary in width, however, and sometimes it is fully or partially shaded). 
Halmos borrowed this symbol from magazines, where it was used to denote “end of article.” 
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7.2 TFAE 


The abbreviation “TFAE” is shorthand for “the following are equivalent”. It is used before 


a set of conditions (each implies all the others). 


In a definition, when one of the conditions is somehow “better” (simpler, shorter, ...), it 
makes sense to phrase the definition with that condition, and mention that the others are 
equivalent. “TFAE” is typically used when none of the conditions can take priority over the 
others. Actually proving the claimed equivalence] must, of course, be done separately. 
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7.3 WLOG 


“WLOG” (or “WOLOG”) is an acronym which stands for “without loss of generality.” 


WLOG is invoked in situations where some of a or system is 
under the particular choice of instance attributes, but for the sake of demonstration, these 


attributes must be 
For example, we might be discussing properties of a[segment] (openjor'closed) of the|real number! 


line. Due to the nature of the reals, we can select endpoints a and b without loss of gen- 
erality. Nothing about our discussion of this segment depends on the choice of a or b. Of 
course, any segment does actually have specific endpoints, so it may help to actually select 
some (say 0 and 1) for clarity. 


WLOG can also be invoked to shorten proofs where there are a number of choices of config- 
uration, but the proof is “the same” for each of them. We need only walk through the proof 
for one of these configurations, and “WLOG?” serves as a note that we haven’t lost anything 
in the choosing. 
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7.4 order of operations 


The order of operations is a convention that tells us how to evaluate mathematical expres- 
sions (these could be purely numerical). The problem arises because expressions consist of 
joperators| applied to variables or values (or other expressions) that each demand individual 
evaluation, yet the order] in which these individual evaluations are done leads to different 
ou O 


A conventional order of operations solves this. One could technically do without memorizing 
this convention, but the only alternative is to use parentheses to (8TOuUp) every single term] of 
an expression and evaluate the innermost operations first. 


For example, in the expression a -b+ c, how do we know whether to apply multiplication or 
addition first? We could interpret even this |simple| expression two drastically different ways: 


1. Add b and c, 


2. Multiply the sum from (1) with a. 
or 


1. Multiply a and 8, 
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2. Add to the product in (1) the value of c. 


One can see the different outcomes for the two cases by selecting some different values for a, 
b, and c. The issue is resolved by convention in order of operations: the correct evaluation 
would be the second one. 


The nearly universal mathematical convention dictates the following order of operations (in 
order of which operators should be evaluated first): 


i 
2. Exponentiation. 
3. Multiplication. 
4. Division. 


5. Addition. 


Any parenthesized expressions are automatically higher “priority” than anything on the 
above list. 


There is also the problem of what order to evaluate repeated operators of the same|type} as 
in: 


a/b/c/d 


The solution in this problem is typically to assume the left-to-right interpretation, For the 
above, this would lead to the following evaluation: 


(((a/b)/c)/d) 


In other words, 


1. Evaluate a/b. 
2. Evaluate (1)/c. 


3. Evaluate (2)/d. 


Note that this isn’t a problem for associative operators such as multiplication or addition in 
the |reals| One must still proceed with caution, however, as associativity is a notion bound] 
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up with the concept of groups rather than just operators. Hence, context is extremely 
important. 


For more obscure operations than the ones listed above, parentheses should be used to remove 
ambiguity. Completely new operations are typically assumed to have the highest priority, 
but the definition of the operation should be accompanied by some|sort|of explanation of how 


it is evaluated in|relation| to itself. For example, explicitly 
defines what order repeated applications of itself should be evaluated in (it is right-to-left 
rather than left-to-right)! 
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Chapter 8 


01A20 — Greek, Roman 


8.1 Roman numerals 


Roman numerals are a method of writing numbers employed primarily by the ancient 
Romans. It place of digits, the Romans used letters to {represent| the numbers central to the 
system: 


10 
50 
100 
500 
1000 


IDa 


Larger numbers can be made by writing a bar over the letter, which means one thousand 
times as much. For instance V is 5000. 


Other numbers were written by putting letters together. For instance JJ means 2. Larger 
letters go on the left, so LIT is 52, but JIL is not a valid Roman numeral. 


One additional rule allows a letter to the left of a larger letter to signify subtracting the 
smaller from the larger. For instance IV is 4. This can only be done once; 3 is written JI, 
not JIV. Also, it is generally required that the smaller letter be the one immediately smaller 
than the larger, so 1999 is usually written MCMXCIX, not MIM. 


It is worth noting that today it is usually considered incorrect to repeat a letter four times, 
so IV is prefered to JIII. However many older monuments do not use the subtraction rule 
at all, so 44 was written X XXXIII instead of the now preferable X LIX. 
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Chapter 9 


01A55 — 19th century 


9.1 Poincar, Jules Henri 


Jules Henri Poincaré was born on April 29% 1854 in Cité Ducale[BA] a neighborhood in 
Nancy, a city in France. He was the son of Dr Léon Poincaré (1828-1892) who was a 
professor at the University of Nancy in the faculty of medicine. [I4] His mother, Eugénie 
Launois (1830-1897) was described as a “gifted mother” [6] who gave special instruction to 
her son. She was 24 and his father 26 years of age when Henri was born]. Two years after 
the birth of Henri they gave birth to his sister Aline. [6] 


In 1862 Henri entered the Lycée of Nancy which is today, called in his honor, the Lycée 
Henri Poincaré. In fact the University of Nancy is also named in his honor. He graduated 
from the Lycée in 1871 with a bachelors degree in letters and sciences. Henri was the top 
of [class] in [almost alll subjects, he did not have much success in music and was described as 
“average at best” in any physical activities.[9] This could be blamed on his poor eyesight 
and absentmindedness.|4] Later in 1873, Poincaré entered l’Ecole Polytechnique where he 
performed better in mathematics than all the other students. He published his first pa- 
per at 20 years of age, titled Démonstration nouvelle des propriétés de l’indicatrice 
d’une surface. |3] He graduated from the institution in 1876. The same year he decided 
to attend l’Ecole des Mines and graduated in 1879 with a|degree|in mining engineering. [A 
After his graduation he was appointed as an ordinary engineer in charge of the mining 
services in Vesoul. At the same time he was preparing for his doctorate in sciences (not 
surprisingly), in mathematics under the supervision of Charles Hermite. Some of Charles 
Hermite’s most famous contributions to mathematics are: Hermite’s polynomials, Hermite’s 
Hermite’s formulalof interpolation and/Hermitian matrices. [9] Poincaré, 


as expected graduated from the University of Paris in 1879, with a thesis relating to dif- 


Jules Henri Poincaré (1854 - 1912) 


85 


ferential equations. He then became a teacher at the University of Caen, where he taught 
analysis. He remained there until 1881. He then was appointed as the “maitre de conférences 
d’analyse” [14] (professor in charge of analysis conferences) at the University of Paris. Also in 
that same year he married Miss Poulain d’Andecy. Together they had four children: Jeanne 
born in 1887, Yvonne born in 1889, Henriette born in 1891, and finally Léon born in 1893. 
He had now returned to work at the Ministry of Public Services as an engineer. He was 
responsible for the development of the northern railway. He held that position from 1881 to 
1885. This was the last job he held in administration for the government of France. In 1893 
he was awarded the title of head engineer in charge of the mines. After that his career awards 
and position continuously escalated in greatness and quantity. He died two years before the 
war on July 17" 1912 of an embolism at the age of 58. Interestingly, at the beginning of 
World War I, his cousin Raymond Poincaré was the president of the French Republic. 


Poincaré’s work habits have been compared to a bee flying from flower to flower. Poincaré 
was interested in the way his mind worked, he studied his habits. He gave a talk about his 
observations in 1908 at the Institute of General Psychology in Paris. He linked his way of 
thinking to how he made several discoveries. His mental organization was not only interesting 
to him but also to Toulouse, a psychologist of the Psychology Laboratory of the School of 
Higher Studies in Paris. Toulouse wrote a book called Henri Poincaré which was published 
in 1910. He discussed Poincaré’s regular schedule: he worked during the same times each 
day in short periods of time. He never spent a long time on a problem since he believed 
that the subconscious would continue working on the problem while he worked on another 
problem. Toulouse also noted that Poincaré also had an exceptional memory. In addition he 
stated that most mathematicians worked from principle already established while Poincaré 
was the type that started from basic principle each time.[{9] His method of thinking is well 
summarized as: 


Habitué a négliger les détails et à ne regarder que les cimes, il passait de lune a 
lautre avec une promptitude surprenante et les faits qu’il découvrait se groupant 
d’eux-memes autour de leur centre étaient instantanémant et automatiquement 
classé dans sa mémoire. (He neglected details and jumped from idea to idea, the 
facts gathered from each idea would then come together and solve the problem) 


The mathematician Darboux claimed he was “un intuitif” (intuitive)[BA], arguing that this 
is demonstrated by the fact that he worked so often by visual representation. He did not 
care about being rigorous and disliked logic. He believed that logic) was not a way to invent 
but a way to structure ideas but that logic limits ideas. 


Poincaré had the [opposite] philosophical views of Bertrand Rusell and Gottlob Fredge who 
believed that mathematics were a branch of logic. Poincaré strongly disagreed, claiming that 
intuition was the life of mathematics. Poincaré gives an interesting point of view in his book 
Science and Hypothesis: 


For a superficial observer, scientific truth is beyond the possibility of doubt; the 
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logic of science is infallible, and if the scientists are sometimes mistaken, this is 
only from their mistaking its rule. [2] 


Poincaré believed that is a synthetic science. He argued that 
cannot be proven non-circularly with the principle of induction [7] Therefore concluding 
that arithmetic is a priori synthetic and not Poincaré then went on to say that 
mathematics can not be a deduced from logic since it is not analytic. It is important to 
note that even today Poincaré has not been proven wrong in his argumentation. His views 
were the same as those of Kant{8]. However Poincaré did not share Kantian views in all 
branches) of philosophy and mathematics. For example in [geometry] Poincaré believed that 
the|structure] of non-Euclidean space can be known analytically. He wrote 3 books that made 
his philosophies known: Science and Hypothesis, The Value of Science and Science 
and Method. 


Poincaré’s first area of interest in mathematics was the fuchsian function that he named after 
the mathematician Lazarus Fuch because Fuch was known for being a good teacher and done 
alot of research in differential equations and in the theory of functions. The functions did 
not keep the name fuchsian and are today called automorphic. Poincaré actually developed 
the concept of those functions as|part] of his doctoral thesis. [9] An automorphic function is a 
function f(z) where z € C which is analytic under its domain] and which is invariant! under 
a of linear fractional they are the generaliza- 
tions of trigonometric functions and [elliptic functions} [I5] Below Poincaré explains how he 


discovered Fuchsian functions: 


For fifteen days I strove to prove that there could not be any functions like those 
I have since called Fuchsian functions. I was then very ignorant; every day I 
seated myself at my work table, stayed an hour or two, tried a great number 
of combinations and reached no results. One evening, contrary to my custom, I 
drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide 
until pairs interlocked, so to speak, making a stable combination. By the next 
morning I had established the existence of a class of Fuchsian functions, those 
which come from the hypergeometric series; I had only to write out the results, 
which took but a few hours. [I] 


This is a clear indication Henri Poincaré brilliance. Poincaré communicated a lot with Klein 
another mathematician working on fuchsian functions. They were able to discuss and further 
the theory|of automorphic(fushian) functions. Apparently Klein became jealous of Poincaré’s 
high opinion of Fuch’s work and ended their relationship on bad terms. 


Poincaré contributed to the field of topology and published Analysis situs in 1895 


which was the first real systematic look at topology. He acquired most of his knowledge 


from his work on differential equations. He also formulated the one 


of the great unsolved mathematics problems. It is currently one of the “Millennium Prize 
Problems”. The problem is stated as: 
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Consider a [compact] 3-dimensional [manifold V without boundary. Is it possible 
that the V could be trivial, even though V is not 
to the 3-dimensional sphere) [5] 


The problem has been attacked by many mathematicians such as Henry Whitehead in 1934, 
but without success. Later in the 50’s and 60’s progress was made and it was discovered 
that for higher dimension! manifolds the problem was easier. (Theorems have been stated for 
those higher dimensions by Stephe Smale, John Stallings, Andrew Wallace, and many more) 
[5] Poincaré also studied |homotopy| theory, which is the study of topology reduced to various 
groups that are algebraically invariant.{9] He introduced the fundamental group in a paper 
in 1894, and later stated his infamous conjecture. He also did work in analytic functions, 
algebraic geometry, and |Diophantine| problems where he made important contributions not 
unlike most of the areas he studied in. 


In 1887, Oscar II, King of Sweden and Norway held a competition to celebrate his sixtieth 
birthday and to promote higher learning.[i] The King wanted a contest that would be of 
interest so he decided to hold a mathematics competition. Poincaré entered the competition 
submitting a memoir on the three body problem which he describes as: 


Le but final de la Mécanique céleste est de résoudre cette grande question de 
savoir si la loi de Newton explique a elle seule tous les phénomènes astronomiques; 
le seul moyen d’y parvenir est de faire des observation aussi précises que possible 
et de les comparer ensuite aux résultats du calcul. (The goal of celestial mechanics 
is to answer the great question of whether Newtonian mechanics explains all 
astronomical phenomenons. The only way this can be proven is by taking the 
most precise observation and comparing it to the theoretical calculations.) [13] 


Poincaré did in fact win the competition. In his memoir he described new mathematical ideas 
such as lhomoclinic| points. The memoir was about to be published in Acta Mathematica 
when an error was found by the editor. This error in fact led to the discovery of chaos 
theory. The memoir was published later in 1890.0] In addition Poincaré proved that the 
determinism and predictability were problems. He also found that the solution of 
the three |body| problem would change drastically with small change on the initial conditions. 
This area of research was neglected until 1963 when Edward Lorenz discovered the famous 
a chaotic deterministic system using a simple model of the atmosphere. [7] 


He made many contributions to different fields of applied mathematics as well such as: celes- 
tial mechanics, fluid mechanics, optics, electricity, telegraphy, capillarity, elasticity, thermo- 
dynamics, potential theory, quantum theory, theory of relativity and cosmology. In the field 
of differential equations Poincaré has given many results that are critical for the qualitative 
theory of differential equations, for example the Poincaré sphere and the Poincaré map. 


It is that intuition that led him to discover and study so many areas of science. Poincaré 
is considered to be the next universalist after Gauss. After Gauss’s death in 1855 people 
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generally believed that there would be no one else that could master all branches of math- 
ematics. However they were wrong because Poincaré took all areas of mathematics as “his 
province” [4]. 
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Chapter 10 


01A60 — 20th century 


10.1 Bourbaki, Nicolas 


by Émilie Richer 
The Problem 


The devastation of World War I presented a unique challenge to aspiring mathematicians of 
the mid 1920’s. Among the many casualties of the war were great numbers of scientists and 
mathematicians who would at this time have been serving as mentors to the young students. 
Whereas other countries such as Germany were sending their scholars to do scientific work, 
France was sending promising young students to the front. A war-time directory of the 
école Normale Supérieure in Paris confirms that about 2/3 of their student population was 
killed in the war.[DJ] Young men studying after the war had no young teachers, they had 
no previous generation to rely on for guidance. What did this mean? According to Jean 
Dieudonné, it meant that students like him were missing out on important discoveries and 
advances being made in mathematics at that time. He explained : “I am not saying that 
they (the older professors) did not teach us excellent mathematics (...) But it is indubitable 
that a 50 year old mathematician knows the mathematics he learned at 20 or 30, but has 
only notions, often rather vague, of the mathematics of his epoch, i.e. the period of time 
when he is 50.” He continued : “I had graduated from the école Normale and I did not know 
what an ideal was! This gives you and idea of what a young French mathematician knew in 
1930.” Henri Cartan, another student in Paris shortly after the war affirmed : “we were 
the first generation after the war. Before us their was a vide, a vacuum, and it was necessary 
to make everything new.” [JA] This is exactly what a few young Parisian math students set 
out to do. 


The Beginnings 


After graduation from the école Normale Supérieure de Paris a group of about ten young 
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mathematicians had maintained very close ties.{WA] They had all begun their careers and 
were scattered across France teaching in universities. Among them were Henri Cartan and 
André Weil who were both in charge of teaching a course on differential and integral calcu- 
lus at the University of Strasbourg. The standard textbook for this {class| at the time was 
“Traité d’Analyse” by E. Goursat which the young professors found to be inadequate in 
many ways.[BA] According to Weil, his friend Cartan was constantly asking him questions 
about the best way to present a given topic to his class, so much so that Weil eventually 
nicknamed him “the grand inquisitor” .[WA] After months of persistent questioning, in the 
winter of 1934, Weil finally got the idea to gather friends (and former classmates) to settle 
their problem by rewriting the treatise for their course. It is at this moment that Bourbaki 
was conceived. 


The suggestion of writing this treatise spread and very soon a loose |circlel of friends, includ- 
ing Henri Cartan, André Weil, Jean Delsarte, Jean Dieudonné and Claude Chevalley began 
meeting regularly at the Capoulade, a café in the Latin quarter of Paris to plan it . They 
called themselves the “Committee on the Analysis Treatise” [BL]. According to Chevalley 
the project was extremely naive. The idea was to simply write another textbook to replace 
Goursat’s.[GD] After many discussions over what to include in their treatise they finally 
came to the conclusion that they needed to start from scratch and present all of essential 
mathematics from beginning to end. With the idea that “the work had to be primarily a 
tool, not usable in some small part of mathematics but in the greatest possible number of 
places” .[DJ] Gradually the young men realized that their meetings were not sufficient, and 
they decided they would dedicate a few weeks in the summer to their new project. The 
collaborators on this project were not aware of it’s enormity, but were soon to find out. 


In July of 1935 the young men gathered for their first congress (as they would later call them) 
in Besse-en-Chandesse. The men believed that they would be able to draft the essentials of 
mathematics in about three years. They did not set out wanting to write something new, 
but to perfect everything already known. Little did they know that their first chapter would 
not be completed until 4 years later. It was at one of their first meetings that the young 
men chose their name : Nicolas Bourbaki. The organization and it’s membership would go 
on to become one of the greatest enigmas of 20th century mathematics. 


The first Bourbaki congress, July 1935. From left to right, back row: Henri Cartan, René 
de Possel, Jean Dieudonné, André Weil, university lab technician, seated: Mirlés, Claude 
Chevalley, Szolem Mandelbrojt. 


André Weil recounts many years later how they decided on this name. He and a few other 
Bourbaki collaborators had been attending the école Normale in Paris, when a notification 
was sent out to all first year science students : a guest speaker would be giving a lecture and 
attendance was highly recommended. As the goes, the young students gathered to 
hear, (unbeknownst to them) an older student, Raoul Husson who had disguised himself with 
a fake beard and an unrecognizable accent. He gave what is said to be an incomprehensible, 
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nonsensical lecture, with the young students trying desperately to follow him. All his results 
were wrong in a non-trivial way and he ended with his most extravagant : Bourbaki’s 
Theorem. One student even claimed to have followed the lecture from beginning to end. 
Raoul had taken the name for his theorem from a general in the Franco-Prussian war. The 
committee was so amused by the story that they unanimously chose Bourbaki as their name. 
Weil’s wife was present at the discussion about choosing a name and she became Bourbaki’s 
godmother baptizing him Nicolas.[{WA] Thus was born Nicolas Bourbaki. 


André Weil, Claude Chevalley, Jean Dieudonné, Henri Cartan and Jean Delsarte were among 
the few present at these first meetings, they were all active members of Bourbaki until their 
retirements. Today they are considered by most to be the founding fathers of the Bourbaki 
group. According to a later member they were “those who shaped Bourbaki and gave it 
much of their time and thought until they retired” he also claims that some other early 
contributors were Szolem Mandelbrojt and René de Possel. [BA] 


Reforming Mathematics : The Idea 


Bourbaki members all believed that they had to completely rethink mathematics. They felt 
that older mathematicians were holding on to old practices and ignoring the new. That is 
why very early on Bourbaki established one it’s first and only rules : obligatory retirement 
at age 50. As explained by Dieudonné “if the mathematics set forth by Bourbaki no longer 
correspond to the trends of the period, the work is useless and has to be redone, this is why 
we decided that all Bourbaki collaborators would retire at age 50.” Bourbaki wanted to 
create a work that would be an essential tool for all mathematicians. Their aim was to create 
something logically ordered, starting with a strong foundation and building continuously on 
it. The foundation that they chose was set theory) which would be the first book in a series 
of 6 that they named “éléments de mathématique” (with the ’s’ dropped from mathématique 
to represent] their underlying belief in the unity of mathematics). Bourbaki felt that the old 
mathematical divisions were no longer valid comparing them to ancient zoological divisions. 
The ancient zoologist would classify animals based on some basic superficial similarities such 
as “all these animals live in the ocean”. Eventually they realized that more complexity 
was required to classify these animals. Past mathematicians had apparently made similar 
mistakes : “the order in which we (Bourbaki) arranged our subjects was decided according to 
a logical and rational scheme. If that does not agree with what was done previously, well, it 
means that what was done previously has to be thrown overboard.” [DJ] After many heated 
discussions, Bourbaki eventually settled on the topics for “éléments de mathématique” they 
would be, in order: 


I Set theory 

i 

m 

IV of one variable 


V topological vector spaces 
VI Integration 
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They now felt that they had eliminated all secondary mathematics, that according to them 
“did not lead to anything of proved importance.” [DJ] The following table summarizes Bour- 
baki’s choices. 


What remains after cutting the loose threads | What is excluded(the loose threads) 


Linear and algebra e i 

A little general topology the least possible Lattices 

Topological [vector] Spaces Most general topology 
Homological algebra Most of group theory finite groups 
[commutative] algebra Most of number theory 


Non-commutative algebra Trigonometrical series 
Interpolation 
Integration Series of 
Applied mathematics 
Riemannian 


Dieudonné’s metaphorical ball of yarn: “here is my picture of mathematics now. It 
is a ball of wool, a tangled hank where all mathematics react upon another in an almost 
unpredictable way. And then in this ball of wool, there are a certain number of threads coming 
out in all directions and not connecting with anything else. Well the Bourbaki method is very 
simple-we cut the threads.” [DJ] 


Reforming Mathematics : The Process 


It didn’t take long for Bourbaki to become aware of the of their project. They were 
now meeting three times a year (twice for one week and once for two weeks) for Bourbaki 
“congresses” to work on their books. Their main rule was unanimity on every point. Any 
member had the right to veto anything he felt was inadequate or imperfect. Once Bourbaki 
had agreed on a topic for a chapter the job of writing up the first draft was given to any 
member who wanted it. He would write his version and when it was complete it would be 
presented at the next Bourbaki congress. It would be read aloud line by line. According 
to Dieudonné “each proof was examined point by point and criticized pitilessly. He goes 
on ”one has to see a Bourbaki congress to realize the virulence of this criticism and how it 
surpasses by far any outside attack.” Weil recalls a first draft written by Cartan (who has 
unable to attend the congress where it would being presented). Bourbaki sent him a telegram 
summarizing the congress, it read : “union intersection partie produit tu es démembré foutu 
Bourbaki” (unionlintersection|subset) product you are dismembered screwed Bourbaki). [WA] 
During a congress any member was allowed to interrupt to criticize, comment or ask questions 
at any time. Apparently Bourbaki believed it could get better results from confrontation 
than from orderly discussion.[BA] Armand Borel, summarized his first congress as “two or 
three monologues shouted at top voice, seemingly independant of one another” . [BA] 


Bourbaki congress 1951. 


After a first draft had been completely reduced to pieces it was the job of a new collaborator 
to write up a second draft. This second collaborator would use all the suggestions and 
changes that the group had put forward during the congress. Any member had to be able to 
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take on this task because one of Bourbaki’s mottoes was “the control of the specialists by the 
non-specialists” [BA] i.e. a member had to be able to write a chapter in a field that was not 
his specialty. This second writer would set out on his assignment knowing that by the time 
he was ready to present his draft the views of the congress would have changed and his draft 
would also be torn apart despite it’s adherence to the congresses earlier suggestions. The 
same chapter might appear up to ten times before it would finally be unanimously approved 
for publishing. There was an javeragejof 8 to 12 years from the time a chapter was approved 
to the time it appeared on a bookshelf.{[DJ] Bourbaki proceeded this way for over twenty 
years, (surprisingly) publishing a great number of volumes. 


Bourbaki congress 1951. 


Recruitment and Membership 


During these years, most Bourbaki members held [permanent] positions at universities across 
France. There, they could recruit for Bourbaki, students showing great promise in math- 
ematics. Members would never be replaced formally nor was there ever a fixed number of 
members. However when it felt the need, Bourbaki would invite a student or colleague to a 
congress as a “cobaille” (guinea pig). To be accepted, not only would the guinea pig have 
to understand everything, but he would have to actively participate. He also had to show 
broad interests and an ability to adapt to the Bourbaki style. If he was silent he would 
not be invited again.(A challenging task considering he would be in the presence of some of 
the strongest mathematical minds of the time) Bourbaki described the reaction of certain 
guinea pigs invited to a congress : “they would come out with the impression that it was a 
gathering of madmen. They could not imagine how these people, shouting -sometimes three 
or four at a time- about mathematics, could ever come up with something intelligent.” 
If a new recruit was showing promise, he would continue to be invited and would gradually 
become a member of Bourbaki without any formal announcement. Although impossible to 
have complete anonymity, Bourbaki was never discussed with the outside world. It was many 
years before Bourbaki members agreed to speak publicly about their story. The following 
table gives the names of some of Bourbaki’s collaborators. 


1% generation (founding fathers) | 2°% generation (invited after WWII) 


H. Cartan J. Dixmier A. Borel 


C. Chevalley R. Godement F. Bruhat 
J. Delsarte S. Eilenberg P. Cartier 


J. Dieudonné J.L. Koszul A. Grothendieck 
A. Weil P. Samuel S. Lang 

J.P Serre J. Tate 

L. Shwartz 


3 Generations of Bourbaki (membership according to Pierre Cartier)[SM]. Note: There 
have been a great number of Bourbaki contributors, some lasting longer than others, this table 
gives the members listed by Pierre Cartier. Different sources list different “official members” 
in fact the Bourbaki website lists J.Coulomb, C.Ehresmann, R.de Possel and S. Mandelbrojt 
as 1% generation members. [BW] 
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Bourbaki congress 1988, from left to right: S. Weil, C. Pisot, A. Weil, J. Dieudonné, C. 
Chabauty, C. Ehresmann, J. Delsarte. 


The Books 


The Bourbaki books were the first to have such a tight organization, the first to use an 
axiomatic presentation. They tried as often as possible to start from the general and work 
towards the particular. Working with the belief that mathematics are fundamentally sim- 
ple and for each mathematical question there is an optimal way of answering it. This 
required extremely rigid structure and notation. In fact the first six books of “éléments de 
mathématique” use a completely linearly-ordered reference system. That is, any reference 
at a given spot can only be to something earlier in the text or in an earlier book. This 
did not please all of its readers as Borel elaborates : “I was rather put off by the very dry 
style, without any concession to the reader, the apparent striving for the utmost generality, 
the inflexible system of internal references and the total absence of outside ones”. However, 
Bourbaki’s style was in fact so efficient that a lot of its notation and vocabulary is still 
in current usage. Weil recalls that his grandaughter was impressed when she learned that 
he had been personally responsible for the symbol @ for the [empty set) [WA] and Chevalley 
explains that to “bourbakise” now means to take a text that is considered screwed up and 
to arrange it and improve it. Concluding that “it is the notion of structure which is truly 
bourbakique” . [GD] 


As well as Ø, Bourbaki is responsible for the introduction of the = (the implication implication jarrow), 
N, R, C, Q and Z (respectively the|naturall real, [complex] {rational numbers) and the|integers) 
C4 (complement! of a set A), as well as the words [bijective] [surjective] and {injective [DR] 


The Decline 


Once Bourbaki had finally finished its first six books, the obvious question was “what next?”. 
The founding members who (not intentionally) had often carried most of the weight were now 
approaching mandatory retirement age. The group had to start looking at more specialized 
topics, having covered the basics in their first books. But was the highly structured Bourbaki 
style the best way to approach these topics? The motto “everyone must be interested in 
everything” was becoming much more difficult to enforce. (It was easy for the first six 
books whose contents are considered essential knowledge of most mathematicians) Pierre 
Cartier was working with Bourbaki at this point. He says “in the forties you can say that 
Bourbaki know where to go: his goal was to provide the foundation for mathematics” .[12] It 
seemed now that they did not know where to go. Nevertheless, Bourbaki kept publishing. 
Its second series (falling short of Dieudonné’s plan of 27 books encompassing most of modern 
mathematics [BA]) consisted of two very successful books : 


Book VII Commutative algebra 
Book VIIT Lie Groups 


However Cartier claims that by the end of the seventies, Bourbaki’s method was understood, 
and many textbooks were being written in its style : “Bourbaki was left without a task. (...) 
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With their rigid format they were finding it extremely difficult to incorporate new mathe- 
matical developments” [SM] To add to its difficulties, Bourbaki was now becoming involved 
in a battle with its publishing company over royalties and translation rights. The matter was 
settled in 1980 after a “long and unpleasant” legal process, where, as one Bourbaki member 
put it “both parties lost and the lawyer got rich” [SM]. In 1983 Bourbaki published its last 
volume : IX Spectral Theory. 


By that time Cartier says Bourbaki was a dinosaur, the head too far away from the tail. 
Explaining : “when Dieudonné was the “scribe of Bourbaki” every printed word came from 
his pen. With his fantastic memory he knew every single word. You could say “Dieudonné 
what is the result about so and so?” and he would go to the shelf and take down the book and 
open it to the right page. After Dieudonné retired no one was able to do this. So Bourbaki 
lost awareness of his own body, the 40 published volumes.” [SM] Now after almost twent 
years without a significant publication is it safe to say the dinosaur has become extinct $ 
But since Nicolas Bourbaki never in fact existed, and was nothing but a clever teaching and 
research ploy, could he ever be said to be extinct? 
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Version: 6 Owner: Daume Author(s): Daume 


10.2 Erds Number 


A low Erdés number is a status symbol among 20th Century mathematicians and is similar 
to the 6-degrees-of-separation concept. 


Let e(p) be the Erdés number of person p. Your Erdös number is 
e 0 if you are Paul Erdos 
e min{e(x)|z € X}+ 1, where X is the set of all persons you have authored a paper 


with. 


Version: 7 Owner: tz26 Author(s): tz26 
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Chapter 11 


03-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


11.1 Burali-Forti paradox 


The Burali-Forti paradox demonstrates that the [class] of all lordinalslis not a set. If there 
were a set of all ordinals, Ord, then it would follow that Ord was itself an ordinal, and 
therefore that Ord € Ord. Even if sets in general are allowed to|contain themselves, ordinals 
cannot since they are defined so that € is well founded over them. 


This paradox is similar to both Russel’s paradox and Cantor’s paradox, although it predates 


both. All of these paradoxes prove that a certain object is ”too large” to be a set. 
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11.2 Cantor’s paradox 


Cantor’s paradox demonstrates that there can be no largest In particular, 
there must be an unlimited number of [infinite] cardinalities. For suppose that a were the 


largest [cardinal] Then we » would have |P(a)| = |a|. Suppose f : a > P(a) is a [bijection] 


proving their \equicardinality| Then X = {8 € a | 8 ¢ f(8)y is alsubset] of a, and so there 
is some y € a such that f(y y= X. But y € X > y ¢ X, which is a paradox. 


The key part of the argument strongly resembles which is in some sense 


a generalization of this paradox. 
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Besides allowing an [unbounded] number of cardinalities as{ZF|set theory) does, this paradox 


could be avoided by a few other tricks, for instance by not allowing the construction of a 


[power set|or by adopting paraconsistent 
Version: 2 Owner: Henry Author(s): Henry 


11.3 Russell’s paradox 


Suppose that for any coherent P(x), we can construct a set {x : P(x)}. Let 
S = {x:x Z x}. Suppose S € S; then, by definition, S ¢ S. Likewise, if S ¢ S, then by 
definition S € S. Therefore, we have a contradiction. Bertrand Russell gave this paradox as 
an example of how a purely intuitive can be The 


one of the Zermelo-Fraenkel axioms, was devised to avoid this paradox by prohibiting self- 
swallowing sets. 


An interpretation) of Russell paradox without any formal of set theory could be 


stated like “If the barber shaves all those who do not themselves shave, does he shave him- 
self?” . If you answer himself that is false since he only shaves all those who do not themselves 
shave. If you answer someone else that is also false because he shaves all those who do not 
themselves shave and in this case he is of that set since he does not shave himself. 
Therefore we have a contradiction. 
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11.4 biconditional 


A biconditional is altruth function|that is true only in the case that both parameters are true 
or both are false. For example, ”a only if b”, ”a just in case b”, as well as ”b implies a and a 
implies b” are all ways of stating a biconditional in english. Symbolically the biconditional 
is written as 

ab 


or 
a&b 


Its is 
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a b ab 
F F T 
F T F 
T E F 
T T T 


In addition, the biconditional function) is sometimes written as ”iff”, meaning ”if and only 
i, 


The biconditional gets its mamelfrom the fact that it is really two conditionals in|conjunction| 
(a > b) ^A (b> a) 


This fact is important to recognize when writing a mathematical proof, as both conditionals 
must be proven independently. 
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11.5 bijection 


Let X and Y be sets. A [function] f: X — Y that is{one-to-one| and [onto is called a bijection 
or bijective function from X to Y. 


When X =Y, f is also called a\permutation| of X. 


Version: 8 Owner: mathcam Author(s): mathcam, drini 


11.6 cartesian product 


For any sets A and B, the cartesian product A x B is the set consisting of all [ordered pairs| 
(a,b) where a € A and be B. 


Version: 1 Owner: djao Author(s): djao 


11.7 chain 


Let B C A, where A is ordered by <. B is a chain in A if any two elements of B are 
comparable. 


That is, B is a linearly ordered subset of A. 


Version: 1 Owner: akrowne Author(s): akrowne 
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11.8 characteristic function 


Definition Suppose A is alsubsetlof a set X. Then the function] 


(x) 1, when ve A, 
LL) => 
a 0, whenzeEX\A 


is the characteristic function for A. 


Properties 


Suppose A, B are subsets of a set X. 


1. For set intersections and set unions, we have 


XANB = XAXB; 
XAUB = XAtTXB-XaANB- 


2. For the 


XAAB = XA + XB — 2XANB: 


3. For the set 


XAL =1— xa. 


Remarks 


A synonym for characteristic function is indicator function [I]. 
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11.9 concentric circles 


A collection of circles is said to be concentric if they have the same|center, The region formed 
between two concentric circles is therefore an |annulu 
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11.10 conjunction 


A conjunction is true only when both parameters (called conjuncts) are true. In English, con- 
junction is denoted by the word ” and”. Symbolically, we [represent] it as A or multiplication 
applied to [Boolean] parameters. Conjunction of a and b would be written 


aNb 


or, in algebraic) context, 
a-b 


or 


The for conjuction is 


a b arb 
F F F 
FT E 
T F F 
TT T 
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11.11 disjoint 

Two sets X and Y are disjoint if their lintersection| X NY is the empty set] 
Version: 1 Owner: djao Author(s): djao 

11.12 empty set 


An empty set @ is a set thatlcontainslno elements. The Zermelo-Fraenkel axioms of set theory 
that there exists an empty set. 


Version: 2 Owner: djao Author(s): djao 
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11.13 even number 


Definition Suppose k is an [integer] If there exists an integer r such that k = 2r + 1, then 
k is an odd number. If there exists an integer r such that k = 2r, then k is an even 
number. 


The concept of even and odd numbers are most easily understood in the binary |basel Then 
the above definition simply |states| that even numbers end with a 0, and odd numbers end 
with a 1. 


Properties 


1. Every integer is either even or odd. This can be proven using induction, or using the 
undamental theorem of arithmetic 


2. An integer k is even (odd) if and only if k? is even (odd). 
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11.14 fixed point 


A fixed point x of alfunction| f: X — X, is a point that remains [constant] upon application 
of that function, i.e.: 


f(z) =z. 
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11.15 infinite 


A set S is infinite if it is not finite} that is, there is no n € N for which there is a [bijection] 
between n and S. Hence an infinite set has alcardinality greater than any natural number! 


[S| > Xo 


Infinite sets can be divided into and For sets S, 
there is a bijection between S and N. This is not the case for uncountably infinite sets (like 


the and any non-trivial real interval). 


Some examples of finite sets: 
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e The [empty set) {}. 
e {0,1} 
e {1,2,3,4,5} 


e {1,1.5,e,7} 
Some examples of infinite sets: 


e {1,2,3,4,...} (countable) 
e The |primes| {2, 3,5,7,9, ...} (countable) 


e An interval of the reals: (0,1) (uncountable) 


e The rational numbers: Q (countable) 
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11.16 injective function 
We say that alfunction| f: X — Y is injective or one-to-one if f(x) = f(y) implies x = y, 
or equivalently, whenever x # y, then f(x) # f(y). 
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11.17 integer 
The set of integers, denoted by the symbol Z, is the set {---—3,—2,—1,0,1,2,3,...} con- 
sisting of the natural numbers) and their negatives. 


Mathematically, Z is defined to be the set of of pairs of natural numbers 
N x N under the equivalence relation (a,b) ~ (c,d) if a +d =b + c. 


Addition and multiplication of integers are defined as follows: 


e (a,b) + (c,d) := (a+ c,b + d) 
e (a,b) - (c,d) := (ac + bd, ad + bc) 
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Typically, the [class] of (a,b) is denoted by symbol n if b < a (resp. —n if a < b), where n is 
the unique natural number such that a = b+ n (resp. a+ n = b). Under this notation, we 


recover the familiar of the integers as {...,-3,—2,—1,0,1,2,3,...}. Here 


are some examples: 


e 0 = equivalence class of (0,0) = equivalence class of (1,1) =... 
e 1 = equivalence class of (1,0) = equivalence class of (2,1) =... 


e —1 = equivalence class of (0,1) = equivalence class of (1,2) =... 


The set of integers Z under the addition and multiplication operations defined above form 


an The integers admit the following making Z into an 
(a,b) < (c,d) in Zifa+d <b+cinN. 


The ring of integers] is also a/Euclidean domain) with given by the absolute value 
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11.18 inverse function 
Definition Suppose f : X — Y isa between sets X and Y, and suppose f~! : 
Y — X is a mapping that satisfies] 


fof = idx, 
Pop = idr: 


Then f~! is called thelinversel of f, or the inverse function of f. 


Remarks 


1. The inverse function of alfunctionl f : X — Y exists if and only if f is a|bijection| that 
is, f is an|injection| and a Surjection 


2. When an inverse function exists, it is unique. 


3. The inverse function and the of a set coincide in the following sense. 
Suppose f~'(A) is the inverse image of a set A C Y under a function f : X — Y. If f 
is a bijection, then f~'(y) = f~'({y}). 


Version: 3 Owner: matte Author(s): matte 
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11.19 linearly ordered 


Anlordering} < (or <) of A is called linear or total if any two elements of A are comparable. 
The pair (A, <) is then called a linearly ordered set. 


Version: 1 Owner: akrowne Author(s): akrowne 


11.20 operator 


Synonym of and [function] Often used to refer to mappings where the [domain] and 
‘codomain| are, in some sense a space of functions. 


Examples: [differential operator, [convolution] operator. 
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11.21 ordered pair 


For any sets a and b, the ordered pair (a,b) is the set {{a}, {a, b}}. 
The characterizing property| of an ordered pair is: 
(a,b) = (c,d) <a =b and c=d, 


and the above construction of ordered pair, as weird as it seems, is actually the simplest 
possible formulation which achieves this property. 
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11.22 ordering relation 


Let S be a set. An ordering relation is a [relation] < on S' such that, for every a,b,c € S: 


e Either a < b, or b <S a, 
e Jfa <bandb <c, then a <S c, 


e Ifa <b and b <a, then a = b. 
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Given an ordering relation <, one can define a relation < by: a < bifa < b and a # b. The 


opposite) ordering is the relation > given by: a > b if b < a, and the relation > is defined 


analogously. 
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11.23 partition 


A partition P of a set Sisa of mutually non-empty sets such that |]J P = S. 


Any partition P of a set S introduces an [equivalence relation] on S, where each p € P is 
an [equivalence class, Similarly, given an equivalence relation on S, the collection of distinct 


equivalence classes is a partition of S. 
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11.24 pullback 


Definition Suppose X,Y, Z are sets, and we have [maps] 


f:Y > Z, 
pP: X — Y. 


Then the pullback of f under © is the 
Of:X > Z, 
xz (fo®)(z). 


Let us denote by M(X,Y) the set of all mappings f : X — Y. We then see that ®* is a 
mapping M(Y, Z) — M(X, Z). In other words, ®* pulls back the set where f is defined on 
from Y to X. This is illustrated in the below diagram. 


xy 


oN, 


Z 


Properties 


1. For any set X, (idx)* = idyycx,x). 
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2. Suppose we have maps 


between sets X,Y, Z. Then 
(Y o Ẹ)* =P oW", 


3. If ®: X —Y is albijection| then ®* is a bijection and 
@y t=), 


4. Suppose X,Y are sets with X C Y. Then we have the inclusion map| :X = Y, and 
for any f : Y — Z, we have 
uf = f\x, 


where f|x is the restriction] of f to X [I]. 
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11.25 set closed under an operation 


A set X is said to be closed under some [map] L, if L maps elements in X to elements in X, 


i.e., L: X — X. More generally, suppose Y is the n-fold [cartesian product| Y =X Keo KX, 
If Lisa map L: Y — X, then we also say that X is closed under the map L. 


The above definition has no|relation| with the definition of a/closed set! in topology, Instead, 
one should think of X and L as a closed system. 


Examples 


1. The set of invertible [matrices] is closed under matrix {inversion| This means that the 


inverse of an invertible matrix is again an invertible matrix. 


2. Let C(X) be the set of complex) valued continuous functions on some topological space 
X. Suppose f,g are functions in C(X). Then we define the [pointwise] product of f 
and g as the function fg: x — f(x)g(x). Since fg is\continuous, we have that C(X) 
is closed under pointwise multiplication. 
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In the first examples, the operations is of the type X — X. In the latter, pointwise multi- 
plication is a map C(X) x C(X) > C(X). 
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11.26 signature of a permutation 


Let X be alfinitelset, and let G be the group)of permutations of X (see permutation group). 
There exists a unique x from G to the multiplicative group {—1,1} such 
that y(t) = —1 for any (loc. sit.) t € G. The value x(g), for any g € G, 


is called the signature or sign of the permutation g. If x(g) = 1, g is said to be of even 
parity; if x(g) = —1, g is said to be of odd parity. 


Proposition: If X is by alrelation| <, then for all g € G, 
x(g) = (1) (11.26.1) 


where k(g) is the number of pairs (x,y) € X x X such that x < y and g(x) > gly). (Such a 
pair is sometimes called an inversion of the permutation g.) 


Proof: This is clear if g is the identity map X — X. If g is any other permutation, then for 
some consecutive a,b € X we have a < b and g(a) > g(b). Let h € G be the transposition 


of a and b. We have 


k(hog) = k(g)-1 
x(hog) = —x(g) 
and the proposition follows by \induction| on k(g). 
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11.27 subset 


Given two sets A and B, we say that A is a subset of B (which we denote as A C B or 
simply A C B) if every element of A is also in B. That is, the following implication) holds: 


rEAS>xrebB. 
Some examples: The set A = {d,7r,i,t,o} is a subset of the set B = {p,e,d,r,i,t,o} because 
every element of A is also in B. That is, A C B. 


On the other hand, if C = {p,e,d,r,o} neither A is a subset of C (because t € A but t ¢ C) 
nor C is a subset of A (because p € C but p ¢ A). The fact that A is not a subset of C is 
written as A Z C. And then, in this example we also have C Z A. 
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If X CY and Y C X, it must be the case that X = Y. 


Every set is a subset of itself, and the empty set) is a subset of every other set. The set A is 
called a of B, if A C B and A # B (in this case we do not use A C B). 
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11.28 surjective 

A function] f: X — Y is called surjective or onto if, for every y € Y, there is an z € X 
such that f(x) = y. 

Equivalently, f: X — Y is onto when its image is all the [codomain| 


Imf = Y. 
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11.29 transposition 


Given a set X = {a1,a2,..., an}, a transposition is a (bijective||function| of X 
[ontolitself) f such that there exist indices 7, j such that f(a;) = a;, f (aj) = a; and f (ap) = ak 
for all other indices k. 


Example: If X = {a,b,c,d,e} the function ø given by 


ola) = a 
olb) = e 
olc) = c 
old) = d 
ole) = b 


is a transposition. 


One of the main results on [symmetric groupslstateslthat any permutation can be expressed 
as composition of transpositions, and for any two [decompositions] of a given permutation, 
the number of transpositions is always even or always odd. 
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11.30 truth table 


A truth table is a tabular listing of all possible input value combinations for a truth function 
and their corresponding output values. For n input variables, there will always be 2” 
in the truth table. A sample truth table for (a A b) — c would be 


a b c (aAb)—>c 
F FE T 
F F T F 
F T E T 
F T T F 
T E F T 
T F T F 
T T F T 
T TT T 
(Note that ^ while — represents the [conditional truth function). 
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Chapter 12 


03-X X — Mathematical logic and 
foundations 


12.1 standard enumeration 


The standard enumeration of {0,1}* is the Sequence) of [strings] so = A, sı = 0, s2 = 1, 
s3 = 00, s4 = 01, --- in lexicographic [order] 


The characteristic function of a Ais xa: N —> {0,1} such that 


T 1, ifs E A 
n = 
me 0, ifs, € A. 


The characteristic sequence of a language A (also denoted as x4) is the concatenation of the 
values of the characteristic function in the [natural] order. 
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Chapter 13 


03B05 — Classical propositional logic 


13.1 CNF 


A propositional is a CNF formula, meaning Conjunctive if it is a 
of of literals (a literal is a propositional variable or its negation). 


Hence, a CNF is a formula of the form: Ky A Ko A... A Kn, where each K; is of the form 
lia V lig V...V lim for literals l;; and some m. 


Example: (x Vy V nz) A (y V aw V mu) A (x VuVu). 
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13.2 Proof that contrapositive statement is true using 
logical equivalence 


You can see that the of an implication) is true by considering the following: 
The statement p => q is logically [equivalent] to =p V q which can also be written as P V q. 


By the same token, the contrapositive statement ¢ = P is logically equivalent to ~g V p 
which, using double negation on q, becomes q V P. 


This, of course, is the same logical statement. 
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13.3  contrapositive 


Given an of the form 
pd 

(”p implies q”) the contrapositive of this implication is 
q—p 

("not q implies not p”). 


An implication and its contrapositive are equivalent statements. When proving a theorem, 
it is often more convenient or more intuitive to prove the contrapositive instead. 
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13.4 disjunction 


A disjunction is true if either of its parameters (called disjuncts) are true. Disjunction 
does not correspond to ”or” in English (see exclusive or.) Disjunction uses the symbol V 
or sometimes + when taken in [algebraic] context. Hence, disjunction of a and b would be 
written 

aVb 


or 
a+b 


The |truth table! for disjunction is 


a b aVb 
F F F 
FT T 
T FE T 
TT T 
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13.5 equivalent 


Two statements A and B are said to be (logically) equivalent if A is true if and only if B is 
true (that is, A implies B and B implies A). This is usually written as A = B. For example, 
for anylinteger]z, the statement ” z is positive” is equivalent to ”z is not negative and z Æ 0”. 
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13.6 implication 


An implication is a logical construction that essentially tells us if one condition is true, then 
another condition must be also true. Formally it is written 


a—b 
or 
a=b 


which would be read ”a implies b”, or ”a therefore b”, or ”if a, then b” (to mamaa few). 


Implication is often confused for ”if and only if’, or the (=). 
They are not, however, the same. The implication a — b is true even if only b is true. So 
the statement ” pigs have wings, therefore it is raining today”, is true if it is indeed raining, 
despite the fact that the first item is false. 


In fact, any implication a — b is called vacuously true when a is false. By contrast, a = b 
would be false if either a or b was by itself false (a = b & (a Ab) V (~a A =b), or in {terms of 
implication as (a — b) A (b > a)). 


It may be useful to remember that a — b only tells you that it cannot be the case that 


b is false while a is true; b must ”follow” from a (and “false” does follow from “false” ). 
Alternatively, a — b is in fact to 


bV ~a 


The for implication is therefore 


a b a—>b 
F F T 
F T T 
T F F 
T T T 
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13.7 propositional logic 


A propositional logic is a in which the only objects are propositions, that is, 
objects which themselves have truth values. Variables [represent] propositions, and there are 
no or except for the constants) T and L (representing true 
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and false respectively). The connectives are typically 4, A, V, and — (representing negation, 


conjunction) disjunction; and implication), however this set is redundant, and other choices 


can be used (T and L can also be considered 0-ary connectives). 


A [modell for propositional logic is just a [truth function) vy on a set of variables. Such a truth 
function can be easily extended to a truth function 7 on all formulas) which [contain] only the 
variables v is defined on by adding [recursive] clauses for the usual definitions of connectives. 
For instance D(a A 3) = T iff P(a) = 7(8) =T. 


Then we say v — ¢ if U(¢) = T, and we say H @ if for every v such that 7(@) is defined, 
v = @¢ (and say that ¢ is a tautology). 


Propositional logic is decidable: there is an easy way to determine whether a|sentencel is a 
tautology. It can be done using [truth tables, since a truth table for a particular formula can 
be easily produced, and the formula is a tautology if every assignment of truth values makes 
it true. It is not known whether this method is efficient: the [equivalent] problem of whether 
a formula is satisfiable (that is, whether its negation is a tautology) is a canonical example 
of an NP-complete problem. 
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13.8 theory 


If L is a logical language for some [logic] £, and T is a set of with no [free variables] 
then T is a theory of £. 


We write T F ¢ for any formula ¢ if every [model M of £ such that MET, ME ¢. 
We write T F ¢ is for there is a proof of @ from T. 
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13.9 transitive 
The transitive property of ‘logic is 


la> b Abs ce > (a>) 


Where => is the conditional truth function, From this we can derive that 
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13.10 truth function 


A truth function is a [function] that returns one of two values, one of which is interpreted 
as ” true”, and the other which is interpreted as ”false”. Typically either ”T” and ”F” are 
used, or ”1” and ”0”, respectively. Using the latter, we can write 


f : {0,1}” — {0,1} 


defines a truth function f. That is, f is a from any number (n) of true/false (0 or 
1) values to a single value, which is 0 or 1. 
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Chapter 14 


03B10 — Classical first-order logic 


14.1 A, bootstrapping 


This proves that a number of useful relations| and functions] are A, in first 


providing a bootstrapping of parts of mathematical practice into any system including the A, 


relations (since the A, relations are exactly the recursive|ones, this includes Turing machines). 


First, we want to build a tupling which will allow a sets of numbers to be 
encoded by a single number. To do this we first show that R(a,b) > a|b is A. This is true 


since alb + Jc < b(a-c = b), a formula with only 


Next note that P(x) x is prime is A; since P(x) e 7dy < z(~y = 1Aylz). Also 
Ap(z,y) @ P(x) A Ply) AVz < ylz < z => 7AP(z)). 


These two can be used to define (the|graph of) a primality function, p(a) = a+1-th prime. 
Let p(a,b) = 3c < bY (-[2|c] A [Va < bYr < b(Ap(a,r) > Yj < d@le | ritte] A btje] A 


[be Ie)). 


This rather awkward looking formula is worth examining, since it illustrates a principle which 
will be used repeatedly. cis intended to be a function of the form 2°-3!-5?--- and so on. If 
it includes b° but not 6*+! then we know that b must be the a + 1-th [prime] The definition 
is so complicated because we cannot just say, as we’d like to, p(a + 1) is the smallest prime 
greater than p(a) (since we don’t allow recursive definitions). Instead we embed the 
of values this recursion would take into a single number (c) and guarantee that the recursive 
relationship holds for at least a{terms} then we just check if the a-th value is b. 


Finally, we can define our tupling relation. Technically, since a given relation must have 
a |fixed||arity, we define for each n a function (£o,..., En} = D aie Then define (x); 
to be the i-th element of x when z is interpreted as a tuple, so (()o,.-.,(@)n) = x. Note 
that the tupling relation, even taken collectively, is not total. For instance 5 is not a tuple 


(although it is sometimes convenient to view it as a tuple with "empty spaces”: (_,_,5)). In 
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situations like this, and also when attempting to extract entries beyond the length, (x); = 0 
(for instance, (5)o = 0). On the other hand there is a 0-ary tupling relation, () = 1. 


Thanks to our definition of p, we have (79,...,2n) = £ Or =p(0)*T1----- p(n)®”+1. This 
is [clearly] Ay. (Note that we don’t use the $` as above, since we don’t have that, but since 
we have a different tupling function for each n this isn’t a problem.) 


For the reverse, (2); = y > (POY e] A a[p(i)"*2|2]) v (ly = 0] A =[p(3) 2). 


Also, define a {length function] by len(x) = y e [p(y + 1)|a] A Yz < y[p(z)|2] and a mem- 


bership relation by in(z,n) © Ji < len(x)[(x); = n]. 


Armed with this, we can show that all primitive recursive functions are A;. To see this, note 


that x = 0, the zero function, is trivially recursive, as are x = Sy and Pym(®1,---,2n) = Zm- 


The A, functions are|closed/under composition, since if ¢(7) and w(Z) both have no unbounded) 
quantifiers, ¢(~(Z)) obviously doesn’t either. 


Finally, suppose we have functions f(#) and g(%,m,n) in A. Then define the primitive 
recursion h(z,y) by first defining: 


h(@,y) = 2 > In(z) = y A Vi < y[(z)1 = 9(4, i, (2)a)] A [In(z) = OV (2)o = F(#) 


and then h(Z,y) = (h(Z, y))y. 


A, is also closed under minimization: if R(Z,y) is a A, relation then py.f(Z, y) is a function 
giving the least y satisfying R(Z,y). To see this, note that py. f(z, y) =z = f(%,z) AVm < 
Za (z, m). 


Finally, using primitive recursion it is possible to concatenate Sequences} First, to concate- 
nate a single number, if s = (£0, ... £n) then s *; y = t- p(len(s) + 1)¥*". Then we can define 
the concatenation of s with t = (yo,..-,Y%m) by defining f(s,t) = s and g(s,t, j,i) = j *1 (t)i, 
and by primitive recursion, there is a function h(s,t,7) whose value is the first 7 elements of 
t appended to s. Then s xt = h(s,t, len(t). 


We can also define *,, which concatenates only elements of t not appearing in s. This just 
requires defining the graph of g to be g(s,t, j,i, x) > [in(s, (f);) Ax = j| V [Fin(s, (t);) Az = 
jail 
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14.2 Boolean 


Boolean refers to that which can take on the values ” true” or ” false”, or that which concerns 
truth and falsity. For example ” Boolean variable”, ” Boolean |logic” ,| ” Boolean statement” , 
etc. 
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” Boolean” is named for George Boole, the 19th century mathematician. 
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14.3 Godel numbering 


A Gödel numbering is any way off assigning numbers to the|formulas of a language, This is 
often useful in allowing |sentences) of a language to be self-referential. The number associated 
with a formula ¢ is called its Gödel number and is denoted '¢'. 


More formally, if £ is a language and G isa from the [terms] of 


£ to the formulas over £ then GS is a Godel numbering. '¢@' may be any term t such that 
S(t) = d. Note that 9 is not defined within £ (there is no formula or object of £ representing 
G), however of it (such as being in the domain! of 9, being alsubformula\ and so 


on) are. 


Athough anything meeting the properties above is a Gödel numbering, depending on the 
specific language and usage, any of the following properties may also be desired (and can 
often be found if more effort is put into the numbering): 


e If d is a subformula of y then ''@!< Ty! 


e For every number n, there is some @ such that"! = n 


e 9 is injective 
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14.4 Geodel’s incompleteness theorems 


Godel’s first and second incompleteness theorems are perhaps the most celebrated results 
in mathematical The basic idea behind Gödel’s proofs is that by the device of 


Godel numbering) one can formulate of and as arithmetical 
properties of the corresponding Godel numbers) thus allowing 1st to speak 


of its own consistency, provability of some sentence and so forth. 


The original result Godel proved in his classic paper On Formally Undecidable 
in Principia Mathematica and Related Systems can be stated as 


Theorem 1. No theory T axiomatisable in the\type system of PM (i.e. in Russell’s theory of types 
which \contains| Peano-arithmetic and is w-consistent proves all true theorems of arithmetic 


(and no false ones). 
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Stated this way, the theorem is an obvious corollary of ‘Tarski’s result on the undefinability of Trut 
This can be seen as follows. Consider a Gödel numbering G, which assigns to each formula) 


@ its Gödel number føl. The set of Gödel numbers of all true sentences of arithmetic is 
{ol | N H $}, and by Tarski’s result it isn’t (definable) by any arithmetic formula. But 
assume there’s a theory T an axiomatisation Axr of which is definable in arithmetic and 
which proves all true statements of arithmetic. But now 4P(P is a proof of x from Axr) 
defines the set of (Gödel numbers of) true sentences of arithmetic, which contradicts Tarski’s 
result. 


The proof given above is highly non-constructive. A much stronger version can actually be 
extracted from Godel’s paper, namely that 


Theorem 2. There is a G, s.t. if T is a theory with a p.r. 


ariomatisation a, and if all primitive recursive functions are representable in T then N 


G(a) but T Y G(a) 


This second form of the theorem is the one usually proved, although the theorem is usually 
stated in a form for which the nonconstructive proof based on Tarski’s result would suffice. 
The proof for this stronger version is based on a similar idea as Tarski’s result. 


Consider the formula 3P(P is a proof of « from a), which defines a predicate Prov,(z) 
which [represents] provability from a. Assume we have numerated the formulae with 
one variable in a/Sequence| B;, so that every open formula occurs. Consider now the sentence 
—=Prov(B,), which defines the non-provability from a predicate. Now, since ~Prov(B,) is 
an open formula with one variable, it must be B, for some k. Thus we can consider the 
closed] sentence B;,(k). This sentence is [equivalent] to ~Prov(subst(!4Prov,(x)!),k)), but 


since subst(!aProv(z)!),k) is just B,(k), it ”asserts its own unprovability”. 


Since all the steps we took to get the undecided but true sentence subst(!aProv(z)!), k) 
is just B,(k) were very mechanic manipulations of Godel numbers guaranteed to 
terminate in [bounded] time, we have in fact produced the p.r. G required by the 


statement of the theorem. 


The first version of the proof can be used to show that also many non-axiomatisable theories 
are For example, consider PA + all true II; sentences. Since I], truth is 
definable at I-level, this theory is definable in arithmetic by a formula a. However, it’s not 
since otherwise Jp(p is a proof of x from a) would be the set of true sentences 
of arithmetic. This can be extended to show that no arithmetically definable theory with 
sufficient expressive {power is complete. 


The second version of Gédel’s first incompleteness theorem suggests alnatural way to extend 
theories to stronger theories which are exactly as as the original theories. This 
of process has been studied by Turing, Feferman, Fenstad and others under the mames] of 
[ordinal logics and transfinite recursive] progressions of arithmetical theories. 


Gödel’s second incompleteness theorem concerns what a theory can prove about its own 
provability predicate, in particular whether it can prove that no contradiction is provable. 
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The answer under very general settings is that a theory can’t prove that it is 
without actually being {inconsistent} 


The second incompleteness theorem is best presented by means of a provability logic. Con- 
sider an arithmetic theory T which is p.r. axiomatised by a. We extend the [language this 
theory is expressed in with a new sentence forming , so that any sentence in 
parentheses prefixed by O is a sentence. Thus for example, O(0 = 1) is a formula. Intu- 
itively, we want O(¢) to express the provability of ¢ from a. Thus the semantics] of our new 
language is exactly the same as that of the original language, with the additional rule that 
O(¢) is true if and only if a F @¢. There is a slight difficulty here; ¢ might itself contain 
boxed expressions, and we haven’t yet provided any semantics for these. The answer is sim- 
ple, whenever a boxed expression O(y) occurs within the scope) of another box, we replace 
it with the arithmetical statement Prov,(w). Thus for example the truth of O(0(0 = 1)) 
is equivalent to a F Prov,('0 = 11). Assuming that a is strong enough to prove all true 
instances of Prov('¢!) we can in fact interprete the whole of the new boxed language by 
the translation. This is what we shall do, so formally a F ¢ (where ¢ might contain boxed 
sentences) is taken to meanma + ox where x is obtained by replacing the boxed expressions 
with arithmetical formulae as above. 


There are a number of restrictions we must impose on a (and thus on O, the meaning of 
which is determined by a). These are known as Hilbert-Bernays derivability conditions and 
they are as follows 


e ifa F ¢ then at O(¢) 


e aF O(¢) > OO) 
e at O(¢ > Y) > (06 => Dy) 


A statement Cons asserts the consistency of a if its equivalent to =0(0 = 1). Gédel’s first 
incompleteness theorem shows that there is a sentence B;,(k) for which the following is true 
=0(0 = 1) — O(B,(k)) A O(~(B;(k)), where © is the dual of O, ie. O(¢) = =O(-7¢). 
A careful analysis reveals that this is provable in any a which satisfied the derivability 
conditions, i.e. at =0(0 = 1) > O(B,(k)) A O(7(B;(k)). Assume now that a can prove 
=0(0 = 1), ie. that œ can prove its own consistency. Then a can prove O(B;(k)) A 
©(=(B;(k)). But this means that a can prove B;,(k)! Thus a is inconsistent. 
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14.5 Lindenbaum algebra 


Let L bea We define the equivalence relation| ~ over formulas] of L by 
yp ~ wif and only if F y & wv. Let B = L/ ~ be the set of equivalence classes, We define 


the operations © and - and complementation, denoted |y] on B by : 
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ly] 6 i] = [ep v y] 
lel - Y] = [p AY] 
[y] = [>y] 


We let 0 = [pA-y] and 1 = [pV-y]. Then thelstructurel (B, 9, -,7,0, 1) is a\Boolean algebra 
called the Lindenbaum algebra. 


Note that it may possible to define the Lindenbaum algebra on/extensions| of 
as long as there is a notion of formal proof that can allow the definition of the equivalence 
relation. 
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14.6 Lindstrom’s theorem 


One of the very first results of the study of theoretic is a characterisation 


theorem due to Per Lindstrom. He showed that the|classical first order logic! is the strongest 
logic having the following 


e Being [closed] under contradictory negation 


e compactness 


e Lowenheim-Skolem theorem 


also, he showed that first order logic can be characterised as the strongest logic for which 
the following hold 


e Completeness (r.e. axiomatisability) 


e Lowenheim-Skolem theorem 
The notion of “strength” used here is as follows. A logic L’ is stronger than L or as strong 
if the |class| of sets definable) in L C the class of sets definable in L’. 
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14.7 Pressburger arithmetic 


Pressburger arithmetic is a weakened form of which includes the 


N, the 0, the mnaryilfunction] S, the [binary] function +, and the <. 
Essentially, it is without multiplication. 


Pressburger arithmetic is decideable, but is consequently very limited in what it can express. 
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14.8 R-minimal element 


Let S be a set and R be alrelation onl S. An element a € S is said to be R-minimal if and 
only if there is no x € S' such that «Ra. 
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14.9 Skolemization 


Skolemization is a way of removing existential quantifiers| from alformula\ Variables {bound 


by existential quantifiers which are not inside the Scope of universal quantifiers can simply 
be replaced by (constants! Jx[x < 3] can be changed to c < 3, with c a suitable constant. 


When the existential quantifier is inside a universal quantifier, the bound variable must 
be replaced by a Skolem function of the variables bound by universal quantifiers. Thus 
Valx = 0 V Jye = y + 1]] becomes Yz = 0 V x = f(x) +1). 


This is used in second order logic to move all existential quantifiers outside the scope of first 
lorder| universal quantifiers. This can be done since second order can quantify over 


For instance V'aV'yd'z¢(z, y, z) is to PFVaV'yd(2, y, F(x, y)). 
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14.10 arithmetical hierarchy 


The arithmetical hierarchy is a hierarchy of either (depending on the context) formulas 
or The relations of a particular of the hierarchy are exactly the relations 
defined by the formulas of that level, so the two uses are essentially the same. 


The first level consists of formulas with only bounded the corresponding relations 
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are also called the |primitive recursive] relations (this definition is|equivalent]|to the definition 


from computer science). This level is called any of A}, X} and II}, depending on context. 


A formula ¢ is X? if there is some A? formula w such that ¢ can be written: 


=. 


olk) = Ix1Yz2- - -Qr (k, Z) 


where Q is either V or 4, whichever maintains the pattern of alternating quantifiers 


The X? relations are the same as the Recursively Enumerable relations. 


Similarly, œ is a II [relationlif there is some A? formula 7 such that: 


olk) = Vr: 3r2- - -Qrt (k, 2) 


where Q is either V or J, whichever maintains the pattern of alternating quantifiers 


A formula is A? if it is both X? and I?. Since each X} formula is just the negation of a TI? 
formula and vice-versa, the X? relations are the complements] of the II? relations. 


The relations in A? = ©? NTI} are the Recursive relations. 


Higher levels on the hierarchy correspond to broader and broader [classes] of relations. A for- 
mula or relation which is 4) (or, equivalently, I?) for some integer|n is called arithmetical. 


The superscript 0 is often omitted when it is not necessary to distinguish from the analytic hierarchy 


can be described as being in one of the levels of the hierarchy if the [graph] of the 
function is in that level. 
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14.11 arithmetical hierarchy is a proper hierarchy 


By definition, we have A, = I, £n. In addition, ©, JU, C Anyi. 


This is proved by vacuous quantification, If R is to @(7) then R is equivalent to 
Yro(ñ) and Ire(r), where x is some variable that does not occur free in ¢. 


More significant is the proof that all containments are proper. First, let n > 1 and U be 
luniversal]for 2-ary £, Then D(x) = U(x, x) islobviously|X,,. But suppose D € An. 
Then D € Pin, so 7D € Xn. Since U is universal, ther is some e such that ~D(x) > U(e, x), 
and therefore ~D(e)  U(e,e) + 7U(e,e). This is clearly a contradiction, so D € Xn \ Ay 
and =D € Iln \ ân. 
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In addition the of D and ~D, defined by 


Dead (2) e (3y < z|z = 2 - y] A D(x)) V (ay < ole = 2 - y| AaD(z)) 


Clearly both D and =D can be recovered from D ~D, so it is contained in neither /,, nor 
II, However the definition above has only except for those in D and 
aD, so D 9 aD(x) € Anyi \ Enr UI 
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14.12 atomic formula 


Let L be affirst order language} and suppose it has [signature] X. A [formula] y of L is said to 


be atomic if and only if : 


1. = “tı = t2”, where tı and tz are {terms} 
2. p= “R(t1,...,tn)”, where R € È is an n-ary relation| symbol. 
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14.13 creating an infinite model 


From the syntactic compactness theorem for first order logic, we get this nice (and useful) 


result: 


Let T be a|theory of first-order [logic] If T has finite/models of unboundedly large [sizes] then 
J also has an infinite) model. 


D efine the 


Ön S Ay cca. ay Æ £2) A... A (£1 É Tn) A (£2 Æ £3) A... A (En-1 É Tn) 


(®, says “there exist (at least) n different elements in the world”). Note that... ®, F 
... F ®) F ®,. Define a new theory 


Teo =7| Honn} 


For any finite subset) J’ C J, we claim that J’ is\consistent} Indeed, J’ [containslaxioms] of 
J, along with finitely many of {®,,}.,. Let Pm correspond to the largest lindex| appearing in 
J’. If Mm = T is a model of J with at least m elements (and by hypothesis, such as model 
exists), then Mm = TU{®m} F TY. 
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So every finite subset of J, is consistent; by the compactness] theorem for first-order logic, 


Jo. is consistent, and by for first-order logic it has a model 
M. Then M = J, F T, so M is a model of J with infinitely many elements (M |= ®,, for 


any n, so M has at least > n elements for all n). 
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14.14 criterion for consistency of sets of formulas 


Let L be a first order language, and A C L be a set of sentences! Then A is [consistent] if 
and only if every of A is consistent. 
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14.15 deductions are A, 


Using the example of Godel numbering, we can show that Proves(a, x) (the statement that 


a is a proof of x, which will be formally defined below) is A4. 


First, Term(z) should be true lff x is the of a Thanks to primitive 


recursion, we can define it by: 


Term(z) di < [x = (0,4)]V 

(5)V 

y < z|x = (6,y) A Term(y)|V 

y,z < x|xz = (8, y, z) A Term(y) A Term(z)]V 
y,z < x|xz = (9,y, z) A Term(y) A Term(z)| 


wW U ws 
I 


Then AtForm(), which is true when z is the Gödel number of an[atomic formula, is defined 
by: 


Form(z) ody, z < a[x% = (1, y, z) A Term(y) A Term(z)]V 
dy, z < z|x = (7, y, z} A Term(y) A Term(z)]V 


Next, Form(), which is true only if x is the Gödel number of a/formulal is defined recursively 
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by: 


Form(x) > AtForm()V 
Ji, y < z|x = (2,1, y} A Form(y)|V 
y < z|z = (3, y) A Form(y)|V 
y,z < alxz = (4,y, z} A Form(y) A Form(z)] 


The definition of QFForm(x), which is true when z is the Gödel numbe of a/quantifier free formula\ 


is defined the same way except without the second clause. 


Next we want to show that the set of logical tautologies is A;. This will be done by formal- 
izing the concept of which will require some development. First we show that 
AtForms(a), which is a pequence] containing the (unique) atomic formulas of a is A;. Define 
it by: 


AtForms(a,t) (~ Form(a) A t = 0)V 
Form(a) A ( 
x,y < aļa = (1, x, yjt = aļV 
x,y < aļa = (7, x,y} At = aļv 
ix < aļa = (2,i, x£} At = AtForms(x)]V 
x < ala = (3,2) At = AtForms(x)|V 
x,y < ala = (4,2, y) At = AtForms(z) x, AtForms(y)]) 


We say v is a truth assignment if it is a sequence of pairs with the first member of each 
pair being a atomic formula and the second being either 1 or 0: 


TA(v) e Vi < len(v)dz, y < (v)i[(v); = (z, y} A AtForm(z) A (y=1Vy=0)] 


Then v is a truth assignment for a if v is a truth assignment, a is quantifier free] and every 
atomic formula in a is the first member of one of the pairs in v. That is: 


TAf(v,a) = TA(v)AQFForm(a)AVi < len(AtForms(a))4j < len(v)|((v);)o = (AtForms(a));] 


Then we can define when v makes a true by: 


True(v,a) @TAf(v, a)A 

AtForm(a) A di < len(v)[((v)i)o = a A ((v)a)1 = 1]V 
y < z|z = (3, y} A True(v, y)]V 
y,z < a[z = (4,y, z) A True(v, y) > True(v, z)| 
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Then a is a tautology if every truth assignment makes it true: 


Taut(a)Vu < ii [TAf(v,a) > True(v, a)] 


We say that a number a is a deduction of ¢ if it encodes a proof of ¢ from a set of 
Az. This means that a is a sequence where for each (a); either: 


e (a); is the Gédel number of an axiom 


e (a); is a logical tautology 


or 


e there are some j,k < i such that (a); = (4, (a)x, (a)i) (that is, (a); is a conclusion 
under modus ponens from (a); and (a),). 


and the last element of a is '@'. 


If Ax is A; (almost every system of axioms, including PA, is A,) then Proves(a, x), which is 
true if a is a deduction whose last value is x, is also A;. This is fairly |simple) to see from the 
above results (let Ax(a) be the relation] specifying that x is the Gödel number of an axiom): 


Proves(a, x) = Vi < len(a)[Ax((a);) V Jj, k < i[(a); = (A, (a)x, (a):)] V Taut((a),)] 
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14.16 example of Godel numbering 


We can define by recursion a |function e from formulas) of arithmetic to numbers, and the 
corresponding Godel numbering) as the inverse 


The symbols of the of arithmetic are =, V, =, —>, 0, S, <, +, -, the variables v; 
for any [integer] i, and ( and ). ( and ) are only used to define the [order of operations, and 


should be inferred where appropriate in the definition below. 


We can define a function e by recursion as follows: 


e (vi) = (0,4) 
s e(o z Y) = cL e(9), e(1))) 
e(Vuid) = (2, e(vi), e(¢)) 
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e7! is a Gödel numbering, with "¢7 = e(¢). 
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14.17 example of well-founded induction 


As an example of the use of in the case where the is not a 
linear one, I'll prove the [fundamental theorem of arithmetic): every natural number! has a 
factorization. 


First note that the division is well-founded. This fact is proven in every 
books. The |-minimal elements are the |prime numbers) We detail the two steps of the proof 


1. If n is prime, then n is its own factorization into primes, so the assertion is true for 
the |-minimal elements. 


2. If n is not prime, then n has a non-trivial factorization (by definition of not being 
prime), i.e. n = ml, where m,n # 1. By\induction, m and £ have prime factorizations, 
and we can see that this implies that n has one too. This takes care of case 2. 


Here are other commonly used well-founded sets : 


1. lideals of a ordered by inverse) proper 
2. Ideals of an ordered by inclusion; 


3. graphs| ordered by minors! (A graph A is a minor of B if and only if it can be obtained 
from B by collapsing edges); 


4. ordinal numbers} 
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5. etc. 
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14.18 first order language 


Terms and formulas of first order logic) are constructed with the classical logical symbols V,5, 
A, V, 7, >, $, and also ( and ), and a set 


== (U re U (U Pu (J Const 


nEw nEw 


where for each natural number n, 


e Rel, is a (usually countable) set of n-ary relation) symbols. 
e Fun, is a (usually countable) set of n-ary [function] symbols. 


e Const is a (usually countable) set of symbols. 


We require that all these sets be The elements of the set © are the only non-logical 
symbols that we are allowed to use when we construct terms and formulas. They form the 


signature of the So far they are only symbols, so they don’t mean anything. For 
most |structures| that we encounter, the set X is finite, but we allow it to be even 


as this sometimes makes things easier, and just about everything still works 
when the signature is uncountable. We also assume that we have an unlimited supply of 
variables, with the only constraint that the (collection) of variables form a set, which should 
be disjoint from the other sets of non-logical symbols. 


The \arity|of a function or relation symbol is the number of parameters the symbol is about 
to take. It is usually assumed to be a of the symbol, and it is bad grammar to use 
an n-ary function or relation with m parameters if m Æ n. 


Terms are built inductively according to the following rules : 


1. Any variable is a term; 
2. Any constant symbol is a term; 


3. If f is an n-ary function symbol, and tı, ..., tn are term, then f(t, ...,t,) is a term. 
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With terms in hands, we build formulas inductively by a finite application of the following 
rules : 


1. If tı and t are terms, then tı = te is a formula; 
2. If Ris an n-ary relation symbol and tı, ..., tn are terms, then R(ty,...,t,) is a formula; 
3. If y is a formula, then so is ~g; 


4. If p and w are formulas, then so is y V 4; 


5. If y is a formula, and z is a variable, then Jz(y) is a formula. 


The other logical symbols are obtained in the following way : 


ge Ab Z (p V) p >y Z yvy 
ped Z yS paS y) Va.p = ~(3x(~p)) 


All logical symbols are used when building formulas. 
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14.19 first order logic 


A is first order if it has exactly one Usually the {terml refers specifically to the 
logic with connectives =, V, A, —, and + and the V and 4, all given the usual 


e —¢ is true iff ¢ is not true 
e ġ V w is true if either ¢@ is true or w is true 


e Yro(x) is true iff d! is true for every object t (where ¢¢, is the result of replacing every 
unbound occurence of x in ¢ with t) 


e dA wv is the same as —(7¢ V a) 
e ġ — wis the same as (mg) VW 
e ¢< Wis the same as (6 > Y) A (W > 6) 


x(x) is the same as “Vr79(2) 
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However with slightly different quantifiers and connectives are sometimes still 
called first order as long as there is only one type. 


Version: 4 Owner: Henry Author(s): Henry 


14.20 first order theories 


Let L be a first-order A theory in L is a set of [sentences] of L, i.e. a set of 
of L that have no 


Definition. A theory T is said to be |consistent) if and only if T YL, where L stands for 
“false”. In other words, T is consistent if one cannot derive a contradiction from it. If ọ is 
a sentence of L, then we say y is consistent with T if and only if the theory T U{y} is 
consistent. 


Definition. A theory T C L is said to be if and only if for every formula y € L, 
either T F y or T F ag. 


A theory T in L is complete if and only if it is consistent. In other words, 
T is complete if and only if for every y ¢ T, T U{ọ} is inconsistent) 


Theorem. (Tarski) Every consistent theory T in L can be extended to a complete theory. 
Proof : Use|Zorn’s lemmalon the [collectionlof consistent theory extending T. © 
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14.21 free and bound variables 


In the entry first-order languages, I have mentioned the use of variables without men- 
tioning what variables really are. A variable is a symbol that is supposed to {range over the 
funiverse| of discourse. Unlike alconstant| it has no fixed] value. 


There are two ways in which a variable can occur in a\formulat free! or bound! Informally, 
a variable is said to occur free in a formula if and only if it is not within the "Scope? of a 
For instance, x occurs free in y if and only if it occurs in it as a symbol, and no 
lsubformulal of y is of the form Jx.w. Here the x after the J is to be taken literally : it is x 
and no other symbol. 


The set FV (4) of\free_variables|of y is defined by |Well-founded induction|on the construction 
of formulas. First we define Var(t), where t is a{term) to be the set or all variables occurring 
in t, and then : 


133 


FV(t, =t) = Var(ta)|_]Var(t2) 


© Var(tg) 
FV(-y) = FV(y¢) 

FV(evy) = FVy)JFV@) 

FV(Ar(y)) = FV(~)\{z} 


FV(R(ti, ...,tn)) 


When for some y, the set F'V(y) is not empty, then it is customary to write y as y(21,...Tn), 
in [order] to stress the fact that there are some free variables left in y, and that those free 
variables are among z1, ..., £n. When 21,...,%, appear free in y, then they are considered as 
place-holders, and it is understood that we will have to supply “values” for them, when 
we want to determine the truth of y. If FV(y) = 9, then ọ is called a|sentence} 


If a variable never occurs free in y (and occurs as a symbol), then we say the variable is 
bound. A variable x is bound if and only if 3x(Y) or Vx(w) is a subformula of y for some w 


The problem with this definition is that a variable can occur both free and bound in the 
same formula. For example, consider the following formula of the lenguage {+4, -, 0,1} of 
: 


xr+1=0AJr(z+y=1) 


The variable x occurs both free and bound here. However, the followingllemmaltells us that 
we can always avoid this situation : 


Lemma 1. It is possible to rename the bound variables without affecting the truth of a 
formula. In other words, if y = Jx(w), or Va(w), and z is a variable not occuring in Y, then 
= p & Az(v(z/x)), where w(z/x) is the formula obtained from w by replacing every free 
occurence of x by z. 
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14.22 generalized quantifier 


Generalized quantifiers are an abstract way of defining 


The underlying principle is that quantified by a generalized quantifier are true if 
the set of elements satisfying those formulas belong in some associated with the 
quantifier. 
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Every generalized quantifier has an larity} which is the number of formulas it takes as argu- 


ments, and a|type, which for an n-ary quantifier is a tuple of length n. The tuple represents} 
the number of quantified variables for each argument. 


The most common quantifiers are those of type (1), including V and 3. If Q is a quantifier 
of type (1), M is the universe) of almodel| and Qm is the relation associated with Q in that 
model, then Qr¢(x) — {x E€ M | o(x)} E€ Qm. 


So Ym = {M}, since the quantified formula is only true when all elements satisfy) it. On the 
other hand Ay = P(M) — {0}. 


In general, the monadic quantifiers are those of type (1,...,1) and if Q is an n-ary monadic 
quantifier then Qum C P(M)". Härtig’s quantifier, for instance, is (1,1), and Iy = {(X,Y) | 
X,Y OCMA|X|=(|Y]}. 


A quantifier Q is polyadic if it is of type (nı, ..., Nnn) where each n; € N. Then: 
Qu E I] PUM y° 
These can get quite elaborate; Way@(z,y) is a (2) quantifier where X € Wy > X is a 


well-ordering. That is, it is true if the set of pairs making ¢ true is a{well-ordering| 
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14.23 logic 


Generally, by logic, peoplemeanifirst order logic, a formal set of rules for building mathemat- 


ical statements out of symbols like = (negation) and — (implication) along with [quantifiers] 
like V (for every) and J (there exists). 


More generally, a logic is any set of rules for forming |sentences) (the logic’s syntax) together 
with rules for assigning truth values to them (the logic’s semantics). Normally it includes 
a (possibly empty) set of types T (also called sorts), which [represent] the different kinds of 
objects that the [theory] discusses (typical examples might be sets, numbers, or sets of num- 
bers). In addition it specifies particular quantifiers, connectives, and variables. Particular 


theories in the logic can then add {relations| and {functions to fully specify a 
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14.24 proof of compactness theorem for first order logic 


The theorem |states) that if a set of of a first-order language) L is [inconsistent] then 
some of it is inconsistent. Suppose A C L is inconsistent. Then by definition 
AFL, i.e. there is a formal proof of “false” using only assumptions from A. Formal proofs 
are finite objects, so let I collect all the formulas! of A that are used in the proof. 
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14.25 proof of principle of transfinite induction 


To prove the transfinite induction|theorem, we note that the [class] of ordinals is [well-ordered] 
by €. So suppose for some ®, there are ordinals a such that (a) is not true. Suppose 
further that © [satisfies the hypothesis, i.e. Va(V3 < a(®(3)) > ®(a)). We will reach a 


contradiction. 


The class C = {a : -®(a)} is not empty. Note that it may be a but this is not 
important. Let y = min(C) be the €-minimal element of C. Then by assumption, for every 


A < y, ®(A) is true. Thus, by hypothesis, ®(y) is true, contradiction. 
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14.26 proof of the well-founded induction principle 


This proof is very similar to the proof of the transfinite induction! theorem. Suppose ©® is 
defined for a well-founded set (S, R), and suppose ® is not true for every a € S. Assume 
further that © |satisfies| requirements 1 and 2 of the statement. Since R is a well-founded 
the set {a € S : ~®(a)} has an R [minimal element|r. This element is either an R 
minimal element of S itself, in which case condition 1 is violated, or it has R predessors. In 
this case, we have by minimality ®(s) for every s such that sRr, and by condition 2, ®(r) is 
true, contradiction. 
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14.27 quantifier 


A quantifier is a logical symbol which makes an assertion about the set of values which 
make one or more [formulas] true. This an exceedingly general concept; the vast majority of 
mathematics is done with the two standard quantifiers, V and J. 
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The universal quantifier V takes a variable and a formula and asserts that the formula 
holds for any value of x. A typical example would be a [sentence] like: 


Val0 < a] 


which |states| that no matter what value x takes, 0 < 7z. 


The existential quantifier J is the dual; that is the formula Vr@(x) is to 
=4xr¢(x). It states that there is some x satsifying the formula, as in 


dz(x > 0] 


which states that there is some value of x greater than 0. 


The scope of a quantifier is the portion of a formula where it binds its variables. Note 
that previous bindings of a variable are overridden within the scope of a quantifier. In the 
examples above, the scope of the quantifiers was the [entire] formula, but that need not be 
the case. The following is a more complicated use of quantifiers: 


The scope of the first existential quantifier. 
Anm 
Vz |x =OVayla=ytl1lA(y=0VaArly=2+4]))]| 
t 


The scope of the universal quantifier. 


+:The scope of the second existential quantifier. Within this area, all references to x refer to 
the variable bound by the existential quantifier. It is impossible to refer directly to the one 
bound by the universal quantifier. 


As that example illustrates, it can be very confusing when one quantifier overrides another. 
Since it does not change the meaning of a sentence to change a bound variable and all bound 
occurrences of it, it is better form to replace sentences like that with an equivalent but more 
readable one like: 


ylz =yt+t1A(y=O0V Azly =2z+1))]| 


Vz[ļz = 0 V 


These sentences both assert that every number is either equal to zero, or that there is some 
number one less than it, and that the number one less than it is also either zero or has 
a number one less than it. [Note: This is not the most useful of sentences. It would be 
nice to replace this with a mathematically sentence which uses nested quantifiers 
meaningfully.] 
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The quantifiers may not {range over all objects. That is, Vr@(x) may not specify that x 
can be any object, but rather any object belonging to some of objects. Similarly 
drd(x) may specify that there is some x within that class which |satisfies| ¢. For instance 
second order logic] has two universal quantifiers, Vt and V? (with corresponding existential 
quantifiers), and variables bound by them only range over the first and second order objects 
respectively. So V'z{0 < 2] only states that all numbers are greater than or equal to 0, not 
that sets of numbers are as well (which would be meaningless). 


A particular use of a quantifier is called bounded or restricted if it the objects 
to a smaller range. This is not quite the same as the situation mentioned above; in the 
situation above, the definition of the quantifier does not include all objects. In this case, 
quantifiers can range over everything, but in a particular formula it doesn’t. This is expressed 


in first order logic) with formulas like these four: 


Valz <c— o(x)|Va[z € X > (x)| alc < c^ gl(x)]ar|z € X ^g(X)] 


The restriction is often incorporated into the quantifier. For instance the first example might 
be written Vx < cļġ(c)]. 


A quantifier is called vacuous if the variable it binds does not appear anywhere in its scope, 
such as Vrdy|0 < z]. While vacuous quantifiers do not change the meaning of a sentence, 
they are occasionally useful in finding an equivalent formula of a specific form. 


While these are the most common quantifiers (in particular, they are the only quantifiers 
appearing in classical first-order logic), some logics use others. The quantifier 4!a¢(x), which 
means that there is a unique x satsifying $(x) is equivalent to dz[¢(x) A Vy[o(y) > z = yl]. 


Other quantifiers go beyond the usual two. Examples include interpreting Qxr@(x) to mean 
there are an [infinite] (or uncountably infinite) number of x satisfying ¢(x). More elaborate 


examples include the |branching| Henkin quantifier, written: 


Vax 
Va 


1 O(a, Y; 4b) 


This quantifier is similar to VrdyVadbd(z, y, a,b) except that the choice of a and b cannot 
depend on the values of a and b. This concept can be further generalized to the game- 
semantic, or independence-friendly, quantifiers. All of these quantifiers are examples of 


generalized quantifiers 
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14.28 quantifier free 


Let L be a/first order language! A formula w is quantifier free {iff it [contains] no 


Let T be a [complete] L-theory. Let S C L. Then S is an elimination set for T iff for every 
w(z) € L there is some ¢(%) € S so that TF Yz(y(z) = (2). 


In particular, T has quantifier elimination iff the set of quantifier free formulas is an elimi- 


nation set for T. In other words T has quantifier elimination iff for every w(x) € L there is 
some quantifier free ¢(z) € L so that T + Yz(y(z)) © (z). 
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14.29 subformula 


Let L be a/first order language and suppose y, w € L are|formulas, Then we say that y is a 


subformula of y if and only if : 


1. Y Fe 


2. w is one of ~a, Vz(a) or dx(a), and either y = a, or is a subformula of a. 


3. pisav orap and either Y =a, y = p, or y is a subformula of a or 8 
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14.30 syntactic compactness theorem for first order 
logic 


Let L be a first-order and A C L be a set of|sentences| If A is inconsistent) then 
some finite| r C A is inconsistent. 
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14.31 transfinite induction 


Suppose (a) is a defined for every ordinal a, the principle of transfinite induc- 
tion (states) that in the case where for every a, if the fact that ©(3) is true for every G <a 
implies that ®(a) is true, then ®(q) is true for every ordinal a. Formally : 
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Va(V3(6 < a => 8(8)) > ®(a)) > Va(®(a)) 


The principle of transfinite induction is very similar to the principle of finite induction, ex- 
cept that it is stated in{terms] of the whole (class) of the ordinals. 
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14.32 universal relation 


If ® is alclass| of n-ary [relations with 7 as the only [free variables, an n + 1-ary [formula] w is 
universal for ® if for any ¢ € ® there is some e such that y(e, 7) > ¢(Z). In other words, 
w can simulate any element of ®. 


Similarly, if ® is a class of of 7, a formula w is universal for ® if for any @ € ® there 
is some e such that w(e, Z) = (2). 
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14.33 universal relations exist for each level of the arith- 
metical hierarchy 


Let L € {£n, An, Hn} and take any k € N. Then there is a k + 1-ary [relation] U € L such 
that U is|universal] for the k-ary |relations| in L. 


Proof 


First we prove the case where L = Aj, the [recursive] relations. We use the example of a 
Godel numbering 


Define T to be a k + 2-ary relation such that Te, Z, a) if: 


e eal oe! 


e ais a deduction of either ¢(Z) or =¢(z) 


Since [deductions are Ay, it follows that T is A4. Then define U'(e,Z) to be the least a such 
that T(e, Z,a) and U (e, Z) © (U' (e, Z) )ien(u(e,z)) = e. This is again A; since the A, functions! 
are |closed under minimization) 
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If f is any k — ary A, function then f(z) = U(f, z). 


Now take L to be the k-ary relatons in either ©, or II. Call the universal relation for 
k + n-ary A, relations Ua. Then any ¢ € L is to a relation in the form 
Qy1Q' yo = Yn (T, y) where ge AG, and so U(z) = QyiQ"yo oe Pilar ee: TA y). Then 
U is universal for L. 


Finally, if L is the k-ary A, relations and @ € L then ¢ is equivalent to relations of the form 
JyVyz +++ Qynth(Z, Y) and Vz1422---Qznn(Z, Z). If the k-ary universal relations for ©, and 
II, are Uy and Uy respectively then (£) => Us("W1, Z) A Unn’, Z). 
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14.34 well-founded induction 


The principle of well-founded induction is a generalization of the principle of transfinite induction 


Definition. Let S be a non-empty set, and R be a partial order] relation on] S. Then R is 
said to be a well-founded if and only if every |subset} X C S has an R-minimal 
element. In the special case where R is altotal order| we say S is|well-ordered by R. The 
(S, R) is called a well-founded set. 


Note that R is by no means required to be a total order. A classical example of a well- 


founded set that is not is the set N of ordered by division, 
i.e. aRb if and only if a divides] b, and a 4 1. The R-minimal elements of this [order] are the 


Let ® bea defined on a well-founded set S. The principle of well-founded induction 


lstates| that if the following is true : 
1. ® is true for all the R-minimal elements of S 


2. for every a, if for every x such that xRa, we have ®(x), then we have ®(a) 


then © is true for every a € S. 


As an example of application of this principle, we mention the proof of the!fundamental theorem of arithmet 
: every natural number has a unique factorization into prime numbers. The proof goes by 
well-founded induction in the set N ordered by division. 
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14.35 well-founded induction on formulas 


Let L be a first-order The of L are built by a [finite] application of the 
rules of construction. This says that thelrelation| < defined on formulas by y < y if and only 
if y is a'subformulal of w is a well-founded relation. Therefore, we can formulate a principle 
of induction] for formulas as follows : suppose P is a [property defined on formulas, then P 
is true for every formula of L if and only if 


1. P is true for the [atomic formulas; 


2. for every formula y, if P is true for every subformula of y, then P is true for y. 
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Chapter 15 


03B15 — Higher-order logic and type 
theory 


15.1 Hartig’s quantifier 


Hartig’s quantifier is a which takes two variables and two written 
Ixyo(x)y(y). It asserts that |{x | 6(x)}| = Hy | o(y)|. That is, the cardinality of the values 


of x which make ¢ is the same as the cardinality of the values which make y(x) true. Viewed 


as a generalized quantifier, I is a (2) quantifier. 


Closely related is the Rescher quantifier, which also takes two variables and two formulas, 
is written Jayd(x)w(y), and asserts that |{x | d(a)}| < Hy | w(y)|. The Rescher quantifier is 
sometimes defined instead to be a similar but different quantifier, Jad(x) > {e | d(x)}| > 
{x | =@(x)}|. The first definition is a (2) quantifier while the second is a (1) quantifier. 


Another similar quantifier is Chang’s quantifier Q°, a (1) quantifier defined by QY, = {X C 
M | |X| = |M|}. That is, Q°xrd(z) is true if the number of z satisfying ¢ has the same 
cardinality as the {universe} for this is the same as Y, but for [infinite] ones it is 


not. 
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15.2 Russell’s theory of types 


After the discovery of the paradoxes of (notably |Russell’s paradox), it become 


apparent that naive set theory must be replaced by something in which the paradoxes can’t 
arise. ‘Two solutions were proposed: type theory and axiomatic set theory based on a 


imitation of size principle (see the entries\class and\von Neumann-Bernays-Godel set theory) 
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Type theory is based on the idea that impredicative definitions are the of all evil. 
Bertrand Russell and various other logicians in the beginning of the 20th century proposed 
an analysis of the paradoxes that singled out so called vicious as the culprits. A 
vicious circle arises when one attempts to define a class by quantifying over a totality of 
classes including the class being defined. For example, Russell’s class R = {x | x ¢ x} 


contains] a variable x that ranges) over all classes. 


Russell’s type theory, which is found in its mature form in the momentous Principia Mathe- 
matica avoids the paradoxes by two devices. First, Frege’s fifth axiom|is abandoned entirely: 
the of predicates do not appear among the objects. Secondly, the predicates 
themselves are ordered into a ramified) hierarchy so that the predicates at the lowest 
can be defined by speaking of objects only, the predicates at the next level by speaking of 
objects and of predicates at the previous level and so forth. 


The first of these principles has drastic to mathematics. For example, the 
predicate “has the same cardinality” seemingly can’t be defined at all. For predicates apply 
only to objects, and not to other predicates. In Frege’s system this is easy to overcome: the 
equicardinality| predicate is defined for extensions of predicates, which are objects. In [order] 
to overcome this, Russell introduced the notion of|types| (which are today known as degrees). 
Predicates of |degree] 1 apply only to objects, predicates of degree 2 apply to predicates of 
degree 1, and so forth. 


Type theoretic [universe] may seem quite odd to someone familiar with the cumulative hier- 
archy of set theory. For example, the [empty set] appears anew in all degrees, as do various 
other familiar such as the Because of this, it is common to 
indicate only the relative differences) in degrees when writing down a/formulal of type theory, 
instead of the absolute degrees. Thus instead of writing 


AP\Vx0(Xo E Pi e x9 F Xo) 


one writes 


AP 1V2;(x; € Piz O T; £ ti) 


to indicate that the formula holds for any i. Another possibility is simply to drop the 
subscripts indicating degree and let the degrees be determined implicitly (this can usually 
be done since we know that x € y implies that if y is of degree n, then x is of degree n + 1). 
A formula for which there is an assignment of types (degrees) to the variables and [constants] 
so that it accords to the restrictions of type theory is said to be stratified. 


The second device implies another in which the predicates are ordered. In any 
given degree, there appears a hierarchy of levels. At first level of degree n + 1 one has 
predicates that apply to elements of degree n and which can be defined with reference only 
to predicates of degree n. At second level there appear all the predicates that can be defined 
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with reference to preidcates of degree n and to predicates of degree n + 1 of level 1, and so 
forth. 


This second principle makes virtually all mathematics break down. For example, when 
speaking of system and its completeness, one wishes to quantify over all pred- 
icates of real numbers (this is possible at degree n + 1 if the predicates of real numbers 
appear at degree n), not only of those of a given level. In order to overcome this, Russell 
and Whitehead introduced in PM the so-called axiom of reducibility, which |states| that if a 
predicate P, occurs at some level k (i.e. P, = P*), it occurs already on the first level. 


Frank P. Ramsay was the first to notice that the axiom of reducibility in effect collapses the 
hierarchy of levels, so that the hierarchy is entirely superfluous in presense of the|axiom, The 
original form of type theory is known as ramified type theory, and the simpler alternative 
with no second hierarchy of levels is known as unramified type theory or simply as simple 
type theory. 


One of type theory is W. v. Quine’s system of set theory known as NF (New 
Foundations), which differs considerably from the more familiar set theories (ZFC, NBG, 


Morse-Kelley). In NF there is a class saying that to any stratified 


formula there corresponds a set of elements satisfying the formula. The Russell class is not 
a set, since it contains the formula z € x, which can’t be stratified, but the class 
is a set: x = x is perfectly legal in type theory, as we can assign to x any degree and get a 
well-formed formula of type theory. It is not known if NF axiomatises any [extensor] (see the 
entry class) based on a limitation of size principle, like the more familiar set theories do. 


In the modern variants of type theory, one usually has a more general supply of types. 
Beginning with some set 7 of types (presumably a division of the [simple] objects into some 
natural|categories), one defines the set of types T by setting 

e ifa,b ET, then (a >b) ET 

e fralter,tET 
[One way) to proceed to get something familiar is to have 7 contain a type t for truth values. 
Then [sentences] are objects of type t, openi formulae of one variable are of type Object — t 


and so forth. This|sort] of type system is often found in the study of typed lambda calculus] 
and also in intensional which are often based on the former. 
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15.3 analytic hierarchy 


The analytic hierarchy is a hierarchy of either (depending on context) lformulas|or|relations! 
similar to the arithmetical hierarchy, It is essentially the Like the 
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arithmetical hierarchy, the relations in each are exactly the relations defined by the 
formulas of that level. 


The first level can be called Aj, Aj, X}, or IIj, and consists of the arithmetical formulas or 
relations. 


A formula ¢ is X} if there is some arithmetical formula w such that: 


olk) = IXY X- - -QXnt(k, Xn) 


where Q is either V or J, whichever maintains the pattern of alternating quantifiers, and each X; is a set 


Similarly, a formula ¢ is TI} if there is some arithmetical formula Y such that: 


>, 


olk) = YX13X2 - - -QX (k, Xn) 


where Q is either V or J, whichever maintains the pattern of alternating quantifiers, and each X; is a set 


Version: 1 Owner: Henry Author(s): Henry 


15.4 game-theoretical quantifier 


A Henkin or branching [quantifier] is a multi-variable quantifier in which the selection of 
variables depends only on some, but not all, of the other quantified variables. For instance 
the simplest Henkin quantifier can be written: 


Vay 
Vaib 


olz, Y, a, b) 


This quantifier, inexpressible in ordinary first order logic, can best be understood by its 
The above is to VaVad(z, f(y), a,g9(a)). Critically, the 


selection of y depends only on x while the selection of b depends only on a. 


‘Logics with this quantifier are stronger than first order logic, lying between first and|second order logic 


in strength. For instance the Henkin quantifier can be used to define the 
and by Hartig’s quantifer: 


ee (z =a y =b)^ (z) > vly)] > Rryd(z)vly) 


To see that this is true, observe that this essentially requires that the 
f(x) = y and g(a) = b the same, and moreover that they are Then for each x 
satisfying ọ(x), there is a different f(x) satisfying w((f(z)). 
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This concept can be generalized to the game-theoretical quantifiers. This concept comes 
from interpreting a formula as a (game) between a ”Prover” and ”Refuter.” A theorem is 
provable whenever the Prover has a winning |strategy} at each A the Refuter [chooses] which 
side they will play (so the Prover must be prepared to win on either) while each V is a choice 


for the Prover. At a >, the {players| switch roles. Then V represents) a choice for the Refuter 


and 3 for the Prover. 


Classical first order logic, then, adds the requirement that the games have|perfect|information| 
The game-theoretical quantifers remove this requirement, so for instance the Henkin quan- 
tifier, which would be written VrdyVasd /yzbd(z, y, a, b) states| that when the Prover makes a 
choice for b, it is made without knowledge of what was chosen at x. 
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15.5 logical language 
In its most general form, a logical language is a set of rules for constructing formulas for 
some logic, which can then by assigned truth values based on the rules of that logic. 


A logical languages £ consists of: 


e A set F of symbols (common examples include + and -) 
e A set R of symbols (common examples include = and <) 
e A set C of logical connectives (usually =, A, V, —> and +>) 


A set Q of (usuallly Y and 3) 
e A set V of variables 


Every function symbol, relation symbol, and connective is associated with an [arity] (the set 
of n-ary function symbols is denoted F;,, and similarly for relation symbols and connectives). 


Each quantifier is a associated with a quantifier [type] (n1,...,%n). 


The underlying logic has a (possibly empty) set of types T. There is a function Type : 
FUV — T which assignes a type to each function and variable. For each arity n is a 
function Inputs, : Fa U Ran — T” which gives the types of each of the arguments to a 
function symbol or relation. In addition, for each quantifier type (n1,..., Nn) there is a 
function Inputs;n, n,,) defined on Qin, nn) (the set of quantifiers of that type) which gives 
an n-tuple of n;-tuples of types of the arguments taken by formulas the quantifier applies to. 


The terms of £ of type t € T are built as follows: 
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1. If v is a variable such that Type(v) = t then v is a term of type t 


2. If f is an n-ary function symbol such that Type(f) = t and t,,...,t, are terms such 
that for each i < n Type(t;) = (Inputs,,(f)); then fti,...,tn is a term of type t 


The formulas of £ are built as follows: 


1. Ifr isan n-ary relation symbol and t,,...,t, are terms such that Type(t;) = (Inputs, (r)); 
then rt,,...,t, is a formula 


2. If cis an n-ary connective and f),..., fn are formulas then cf,,..., fn is a formula 


3. Ifqis a quantifier of type (m1,...,Mn), Vii» «+5 Ulmis V2,15 +++) Unjly +++) Unnn Are ABEQUETICE] 
of variables such that Type(v: j) = ((Inputsy,, nn) (@))j)i and fi,---, fn are formulas 
then GOs i 2) Ulni, Ua -<3 Un,1; -© -3 Ung Jis- --, fn is a formula 


Generally the connectives, quantifiers, and variables are specified by the appropriate logic, 
while the function and relation symbols are specified for particular Note that 
0-ary functions are usually called constants. 


If there is only one type which is equated directly with truth values then this is essentially 


a propositional logic. If the standard quantifiers and connectives are used, there is only one 
type, and one of thelrelationslis = (with its usual|semantics), this produces first order logic| 


If the standard quantifiers and connectives are used, there are two types, and the relations 


include = and € with appropriate semantics, this is second order logic) (a slightly different 
formulation replaces € with a 2-ary function which|represents| function application; this views 


second order objects as functions rather than sets). 


Note that often connectives are written with infix notation with parentheses used to control 
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15.6 second order logic 


Second order logic refers to |logics| with two (or three) where one type consists of the 
objects of interest and the second is either sets of those objects or [functions|on those objects 
(or both, in the three type case). For instance, [second order arithmetic] has two types: the 


numbers and the sets of numbers. 


Formally, second order logic usually has: 


e the standard (four of them, since each type needs its own and 
existential quantifiers) 
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e the standard connectives 
e the lrelation| = with its normalsemantics 


e if the second type {represents| sets, a relation € where the first argument is of the first 
type and the second argument is the second type 


e if the second type represents functions, a|binary| function which takes one argument of 
each type and results in an object of the first type, representing function application 


Specific second order logics may deviate from this definition slightly. In particular, some 


mathematicians have argued that first order logics] which additional quantifiers which give 


it most or all of the strength of second order logic should be considered second order logics. 


Some people, chiefly Quine, have raised philisophical objections to second order logic, cen- 
tering on the question of whether [models] require fixing some set of sets or functions as the 
“actual” sets or functions for the purposes of that model. 
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Chapter 16 


03B40 — Combinatory logic and 
lambda-calculus 


16.1 Church integer 


A Church integer is a of as functions, invented by Alonzo Church. 
An integer N is represented as a higher-order function, which applies a given function to a 


given expression N times. 
For example, in Haskell, a function that returns a particular Church integer might be 


The transformation) from a Church integer to an integer might be 
unchurch n = n (+1) 0 


Thus the (+1) function would be applied to an initial value of 0 n times, yielding the ordinary 
integer n. 
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16.2 combinatory logic 


Combinatory logic was invented by Moses Schonfinkel in the early 1920s, and was mostly 
developed by Haskell Curry. The idea was to reduce the notation of [logic] to the simplest 
[terms] possible. As such, combinatory logic consists only of combinators, combination 
operations, and no free variables. 
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A combinator is simply alfunction| with no free variables. A free variable is any variable 
referred to in a function that is not a parameter of that function. The operation of com- 
bination is then simply the application of a combinator to its parameters. Combination is 
specified by juxtaposition of two terms, and is left-associative. Parentheses may also 


be present to override For example 


foxy = (fa)xy = ((fg)x)y 


All combinators in combinatory logic can be derived from two basic combinators, S and K. 
They are defined as 


Sfgz = fx(g) 
Kry = x 


Reference is sometimes made to a third basic combinator, J, which can be defined in terms 
of S and K. 


TIx=SKKu=z2 


Combinatory logic where J is considered to be derived from S and K is sometimes known 
as pure combinatory logic. 


Combinatory logic and lambda calculus) are However, lambda calculus is more 
concise than combinatory logic; an expression of [size] O(n) in lambda calculus is equivalent 
to an expression of size O(n?) in combinatory logic. 


For example, Sfgx = fx(gx) in combinatory logic is equivalent to S = (Af (Ag(Ax((fx)(gx))))), 
and Kary = x is equivalent to K = (Azr(Ayz)). 
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16.3 lambda calculus 


Lambda calculus (often referred to as A-calculus) was invented in the 1930s by Alonzo 
Church, as a form of mathematical logic dealing primarly with functions|and the application 
of functions to their arguments. In pure lambda calculus, there are no In- 
stead, there are only lambda abstractions (which are simply [specifications] of functions), 


variables, and applications of functions to functions. For instance, Church integers) are used 
as a substitute for actual constants representing integers| 
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A lambda abstraction is typically specified using a lambda expression, which might look 
like the following. 


àx. fx 


The above specifies a function of one argument, that can be reduced by applying the 
function f to its argument (function application is left-associative by default, and parentheses 


can be used to specify associativity). 


The A-calculus is equivalent to (though much more concise). Most functional] 
programming are also equivalent to A-calculus, to a (any imperative fea- 


tures in such languages are, of course, not equivalent). 


Examples 


We can specify the Church integer 3 in A-calculus as 


3=Afa.f (f 2) 


Suppose we have a function inc, which when given a [string] representing an integer, returns 
a new string representing the number following that integer. Then 


3 Inc OQ” = 97:3” 


Addition of Church integers in A-calculus is 


add = Any-(Afz-.zf (yf z)) 
add23 = AT 222-7 (85 2) 
Af z.2f FFE 2) 


Afa FFF 2)))) 
5 


Multiplication is 


mul = Amy. (Afze.u(Aw.y f w) 2) 
mul23 = Af#.2(Aw.3 fw) 2 
= Afz.2(aw.f(f(fw)))z 
= AJJ T OJT 2)))) 
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Russell’s Paradox in )-calculus 


The A-calculus readily admits Let us define a function r that takes a 


function x as an argument, and is reduced to the application of the logical function not to 
the application of x to itself. 


r = Ag net (ea) 


Now what happens when we apply r to itself? 


rr = not(rr) 


= not (not (r r)) 


Since we have not (r r) = (r r), we have a paradox. 
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Chapter 17 


03B48 — Probability and inductive 
logic 


17.1 conditional probability 


Let (Q, 8, u) be a and let X and Y be on Q with joint 
probability [distributionl 4(X.Y) := (X (VY). 


The conditional probability of X given Y is defined as 


_ KXNY) 
MAY := ay (17.1.1) 
In general, 
MX|Y)u(Y) = WX, Y) = u(Y |X )u(X), (17.1.2) 
and so we have 
_ WY |X) ul X) 
PANY) = — al¥) . (17.1.3) 
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Chapter 18 


03B99 — Miscellaneous 


18.1 Beth property 


Allogicjis said to have the Beth property if whenever a predicate R is implicitly definable by @ 
(i.e. if all{models| have at most one unique/extension| satisfying ¢), then R is explicitly defin- 


able relative to ¢ (i.e. there is a y not containing R,such that @ H V21,.., En(R(£1, ..., En) © 


WO Dig -3 En)))- 
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18.2 Hofstadter’s MIU system 


The alphabet) of the system contains) three symbols M, I,U. The set of theorem is the set of 
string) constructed by the rules and the axiom) is denoted by J and can be built as follows: 


(axiom) MI €T. 
(i) Ifal €T then 2IU €T. 


) 
(ii) If Ma € T then Maa € T. 
(iii) In any theorem, JIT can be replaced by U. 
) 


(iv) In any theorem, UU can be omitted. 


example: 
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e Show that MUI € T 

MIET by axiom 

— MIET by rule (ii) where x = I 

— MII ET by rule (ii) where x = IT 
— MIHI eJ by rule (ii) where x = [III 
— MITIIIIIIU €T by rule (i) where x = MIJIIIII 
— MIIIIIUU €J _ by rule (iii) 
— MIH ET by rule (iv) 

(iii) 


— MUH €T by rule (iii 


e Is MU a theorem? 
No. Why? Because the number of I’s of a theorem is never a multiple of 3. We will 


show this by structural 


base case: The statement is true for the case. Since the axiom has one / . 
Therefore not a multiple of 3. 

induction hypothesis: Suppose true for premise of all rule. 

induction step: By induction hypothesis we assume the premise of each rule to be true 
and show that the application of the rule keeps the staement true. 

Rule 1: Applying rule 1 does not add any Ts to the |formula, Therefore the statement 
is true for rule 1 by induction hypothesis. 

Rule 2: Applying rule 2 doubles the amount of J’s of the formula but since the initial 
amount of /’s was not a multiple of 3 by induction hypothesis. Doubling that amount 
does not make it a multiple of 3 (i.e. ifn 4 0mod3 then 2n 4 0mod3). Therefore 
the statement is true for rule 2. 

Rule 3: Applying rule 3 replaces JII by U. Since the initial amount of I’s was not a 
multiple of 3 by induction hypothesis. Removing JII will not make the number of I’s 
in the formula be a multiple of 3. Therefore the statement is true for rule 3. 

Rule 4: Applying rule 4 removes UU and does not change the amount of I’s. Since 
the initial amount of J’s was not a multiple of 3 by induction hypothesis. Therefore 
the statement is true for rule 4. 


Therefore all theorems do not have a multiple of 3 I’s. 
REFERENCES 
[HD] Hofstader, R. Douglas: Gödel, Escher, Bach: an Eternal Golden Braid. Basic Books, Inc., 


New York, 1979. 
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18.3 IF-logic 


Independence Friendly logic (IF-logic) is an interesting conservative |extension|of classical first order logic| 
based on very Inatural| ideas from game] theoretical semantics] developed by Jaakko Hintikka 
and Gabriel Sandu among others. Although IF-logic is a conservative extension of first order 
logic, it has a number of interesting such as allowing truth-definitions and ad- 
mitting a translation of all X1 (second order|sentences with an intial second order 


existential quantifier| followed by a first [order] sentence). 


IF-logic can be characterised as the natural extension of first order logic when one allows 
informational independence] to occur in the game theoretical truth definition. To understand 
this idea we need first to introduce the game theoretical definition of truth for classical first 
order logic. 


To each first order sentence @ we assign a game G(@) with to players) played on of 
the appropriate language, The two players are called verifier and falsisier (or nature). The 
idea is that the verifier attempts to show that the sentence is true in the model, while the 
falsifier attempts to show that it is false in the model. The game G(@) is defined as follows. 
We will use the convention that if p is a symbol that a function, a predicate or an 
object of the model M, then p™ is that named entity. 


e if P is an n-ary predicate and t; are names of elements of the model, then G( P(t, ..., tn)) 
is a game in which the verifier immediatedly wins if (t™, ..., t”) € P™ and otherwise 
the falsifier immediatedly wins. 


e the game G(¢, V ¢2) begins with the choice ¢; from ¢; and ¢2 (i = 1 or i = 2) by the 
verifier and then proceeds as the game G(¢;) 


e the game G(¢1 A ¢2) is the same as G(¢) V ¢2), except that the choice is made by the 
falsifier 


e the game G(Ar¢(x)) begins with the choice by verifier of a member of M which is 
given a name a, and then proceeds as G(¢(a)) 


e the game G(Vr¢(z)) is the same as G(Ar¢(x)), except that the choice of a is made by 
the falsifier 


e the game G(-=¢) is the same as G(ġ) with the roles of the falsifier and verifier exchanged 


Truth of a sentence ¢ is defined as the existence of a winning [strategy] for verifier for the 
game G(@#). Similarly, falsity of @ is defined as the existence of a winning strategy for the 
falsifier for the game G(¢). (A strategy is a|specification| which determines for each move the 
opponent does what the player should do. A winning strategy is a strategy which guarantees 
victory no matter what strategy the opponent follows). 
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For classical first order logic, this definition is [equivalent] to the usual Tarskian definition of 
turth (i.e. the one based on|satisfaction| found in most treatments of semantics of first order 
logic). This means also that since the law of excluded middle holds for first order logic that 
the games G(@) have a very strong property; either the falsifier or the verifier has a winning 
strategy. 


Notice that all rules except those for negation and atomic sentences concern choosing a 
sentence or finding an element. These can be codified into functions, which tell us which 
sentence to pick or which element of the model to choose, based on our previous choices 
and those of our opponent. For example, consider the sentence Va(P(x) V Q(x)). The 
corresponding game begins with the falsifier picking an element a from the model, so a 
strategy for the verifier must specify for each element a which of Q(a) and P(a) to pick. The 
truth of the sentence is equivalent to the existence of a winning strategy for the verifier, i.e. 
just such a function. But this means that Vz(P(x)VQ()) is equivalent to 3fVxP(x)A f(x) = 
OVQ(x)Af (x) = 1. Let’s consider a more complicated example: VrdyVzds4P(z, y, z,s). The 
truth of this is equivalent to the existence of a functions f and g, s.t. VxVzP(a, f(x), z,g(z)). 


These |sort) of functions are known as|Skolem functions\ and they are in essence just winning 
strategies for the verifier. We won’t prove it here, but all first order sentences can be 
expressed in form 3f1...3fnYz1...V£k, where œ is a truth of atomic 
sentences in which all {terms] are either constants] or variables x; or formed by application of 
the functions f; to such terms. Such sentences are said to be in Xi form. 


Let’s consider a Xt sentence 3fagVxrV2¢(z2, f(x), y,g(z)). Up front, it seems to assert the 
existence of a winning strategy in a semantical game like those described above. 
However, the game can’t correspond to any (classical) first order Let’s first see 
what the game the existence of a winning strategy of which this formula asserts looks like. 
First, the falsifier (chooses| elements a and b to serve as x and y. Then the verifier chooses 
an element c knowing only a and an element d knowing only b. The verifier’s goal is that 
o(a, c, b, d) comes out as a true atomic sentence. The game could be actually arranged so 
that the verifier is a team of two players (who aren’t allowed to communicate with each 
other), one of which picks c the other one picking d. 


From a game theoretical point of view, games in which some moves must be made without 
depending on some of the earlier moves are called informationally games, and 
they occur very commonly. is such a game, for example, and usually [reall examples 
of such games have “players” being actually teams made up of several people. 


IF-logic comes out of the game theoretical definition in a natural way if we allow informa- 
tional independence in our semantical games. In IF-logic, every connective can be augmented 
with an independence marker //, so that *//*’ means that the game for the occurance of 
*’ within the scope) of x must be played without knowledge of the choices made for *. For 
example (Vx//Sy)Sy¢(a, y) asserts that for any choice of value for x by the falsisier, the 
verifier can find a value for y which does not depend on the value of z, s.t. (x,y) comes 
out true. This is not a very [characteristic example, as it can be written as an ordinary first 
order formula 4yVxr¢(z, y). The curious game we described above corresponding to the sec- 
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ond order Skolem-function formulation 5} sentence 4fagVxVz¢(z, f(x), y, g(z)) corresponds 
to an IF-sentence (Va//dy)(Vz//du) (Sy) (Su) ¢(a, y, z,u). IF-logic allows informational in- 
dependence also for the usual logical connectives, for example (Vz//V)(¢(x) V y(x)) is true 
if and only if for all x, either ¢(x) or y(x) is true, but which of these is picked by the verifier 
must be decided independently of the choice for x by the falsifier. 


o—™ 


2 


One of the striking characteristics of IF-logic is that every X1 formula ¢ has an IF-translation 
¢'F which is true if and only if ¢ is true (the does not in general hold if we 
replace ’true’ with ’false’). Since for example first order truth (in a model) is 5} 
(it’s just quantification over all possible {valuations, which are second order objects), there 
are IF-theories which correctly represent] the truth predicate for their first order {part} What 
is even more striking is that sufficiently strong IF-theories can do this for the whole of the 
language they are expressed in. 


This seems to contradict Tarski’s famous result on the undefinability of truth, but this is 
illusory. Tarski’s result depends on the assumption that the [logic] is [closed] under contradic- 
tory negation. This is not the case for IF-logic. In general for a given sentence ¢@ there is no 
sentence * which is true just in case ¢ is not true. Thus the law of excluded middle does 
not hold in general in IF-logic (although it does for the classical first order portion). This is 
quite unsurprising since games of imperfect are very seldom determined in the 
sense that either the verifier or the falsisifer has a winning strategy. For example, a game in 
which I choose a 10-letter word and you have one go at guessing it is not determined in this 
sense, since there is no 10-letter word you couldn’t guess and on the other hand you have 
no way of [forcing] me to choose any particular 10-letter word (which would guarantee your 
victory). 


IF-logic is stronger than first order logic in the usual sense that there are |classes| of structures) 
which are IF-definable but not first-order definable. Some of these are even Many 
interesting concepts are expressible in IF-logic, such as (which can 
be expressed by a logical formula in contradistinction to ordinary first order logic in which 
non-logical symbols are needed), well-order 


By Lindstrom’s theorem we thus know that either IF-logic is not (i.e. it’s set of 
validities is not r.e.) or the L6wenheim-Skolem theorem does not hold. In fact, (downward) 
Lowenheim-Skolem theorem does hold for IF-logic, so it’s not complete. There is a com- 
plete disproof procedure for IF-logic, but because IF-logic is not closed under contradictory 
negation this does not yield a complete proof procedure. 


IF-logic can be extended by allowing contradictory negations of closed sentences and turth 
functional combinations thereof. This extended IF-logic is extremely strong. For example, 
the second order induction axioml for [PA] is VX ((X (0) AVy(X(y) = X (y +1))) > VyX(y)). 
The negation of this is a X! sentence asserting the existence of a set which invalidates the 
induction axiom. Since ©! sentences are expressible in IF-logic, we can translate the negation 
of the induction axiom into IF-sentence ¢. But now —¢@ is a formula of extended IF-logic, 
and is [clearly] equivalent to the usual induction axiom! As all the rest of PA laxiomsl are first 
order, this shows that extended IF-logic PA can correctly define the {natural number! system. 
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There exists also an interesting “translation” of nth order logic into extended IF-logic. Con- 
sider an n-sorted first order language and an nth order|theory|T translated into this language. 
Now, extend the language to second order and add the axiom stating that the sort k + 1 
actually comprises the whole of the powerset of the sort k. This is a II’ sentence (i.e. of 
the form “for all predicates P there is a first order element of sort k + 1 which comprises 
exactly the k extension of P”). It is easy to see that a formula is valid in this new system if 
and only if it was valid in the original nth order logic. The negation of this axiom is again 
©! and translatable into IF-logic and thus the axiom itself is expressible in extended IF- 
logic. Moreover, since most interesting [second order theories| are finitely axiomatisable, we 
can consider sentences of form T* — ¢ (where T is the multisorted translation of T), which 
express logical implication] of @ by T (correctly). This is equivalent to ~(T*) V @ (where ~ 
is contradictory), but since T is a [conjunction] of a Pi; sentence asserting comprehension 
translated into extended IF-logic and first order translation of the axioms of T, this is a X1 
formula translatable to non-extended IF-logic and so is ¢. Thus sentences of form T —> @ 
of nth order logic are translatable into IF-sentences which are true just in case the originals 
were. 
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18.4 ‘Tarski’s result on the undefinability of Truth 


Assume L is a logic) which is closed under contradictory negation and has the usual truth- 
functional connectives. Assume also that L has a notion of open formula! with one variable 
and of substitution. Assume that T is a|theory| of L in which we can define define surrogates 
for formulae of L, and in which all true instances of the substitution [relation] and the truth- 
functional connective relations are provable. We show that either T is{inconsistent|or T can’t 
be augmented with a truth predicate True for which the following T-schema holds 


True('¢’) = @ 


Assume that the open formulae with one variable of L have been indexed by some suitable 
set that is representable in T (otherwise the predicate True would be next to useless, since if 
there’s no way to speak of|sentences] of a logic, there’s little hope to define a truth-predicate 
for it). Denote the i:th element in this indexing by B;. Consider now the following [open] 
formula with one variable 


Liar(x) = ~True(B,)(x) 


Now, since Liar is an open formula with one it’s indexed by some i. Now 
consider the sentence Liar(i). From the T-schema we know that 
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True(Liar(7)) > Liar(i) 


and by the definition of Liar and the fact that 7 is the index of Liar(a) we have 


True(Liar(7)) yan +True(Liar(i)) 


which [clearly] is absurd. Thus there can’t be an extension of 7’ with a predicate Truth for 
which the T-schema holds. 


We have made several assumptions on the logic L which are crucial in [order] for this proof 
to go trough. The most important is that L is [closed] under contradictory negation. There 
are logics which allow truth-predicates, but these are not usually closed under contradictory 
negation (so that it’s possible that True(Liar(i)) is neither true nor false). These logics 
usually have stronger notions of negation, so that a sentence =P says more than just that 


P is not true, and the that P is simply not true is not expressible. 


An example of a logic for which Tarski’s undefinability result does not hold is the so-called 


Independence Friendly logic) the of which is based on and which 
allows various generalised quantifiers] (the Henkin |branching quantifier, &c.) to be used. 
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18.5 axiom 


In a nutshell, the logico-deductive method is a system of inference where conclusions (new 
knowledge) follow from premises (old knowledge) through the application of sound) arguments 
(syllogisms, rules of inference). Tautologies excluded, nothing can be deduced if nothing 
is assumed. Axioms and postulates are the basic assumptions underlying a given 
of deductive knowledge. They are accepted without demonstration. All other assertions 
(theorems, if we are talking about mathematics) must be proven with the aid of the basic 
assumptions. 


The logico-deductive method was developed by the ancient Greeks, and has become the core 
principle of modern mathematics. However, the [interpretation] of mathematical knowledge 
has changed from ancient times to the modern, and consequently the axiom and 
postulate hold a slightly different meaning for the present day mathematician, then they 
did for Aristotle and Euclid. 


The ancient Greeks considered |geometry|as just one of several sciences, and held the theorems 
of geometry on par with scientific facts. As such, they developed and used the logico- 
deductive method as a means of avoiding error, and for structuring and communicating 


knowledge. Aristotle’s Posterior Analytics is a definitive exposition of the classical view. 
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“Axiom”, in classical terminology, referred to a self-evident assumption common to many 


[branches] of science. A good example would be the assertion that 
When an equal amount is taken from equals, an equal amount results. 


At the foundation of the various sciences lay certain basic hypotheses that had to be accepted 
without proof. Such a hypothesis was termed a postulate. The postulates of each science 
were different. Their validity had to be established by means of real-world experience. 
Indeed, Aristotle warns that the content of a science cannot be successfully communicated, 
if the learner is in doubt about the truth of the postulates. 


The classical approach is well illustrated by Euclid’s elements, where we see a list of axioms 
(very basic, self-evident assertions) and postulates (common-sensical geometric facts drawn 
from our experience). 

A1 Things which are equal to the same thing are also equal to one another. 

A2 If equals be added to equals, the wholes are equal. 

A3 If equals be subtracted from equals, the remainders are equal. 

A4 Things which coincide with one another are equal to one another. 

A5 The whole is greater than the [part] 

P1 It is possible to draw a straight line from any point to any other point. 

P2 It is possible to produce alfinite straight line continuously in a straight line. 

P3 It is possible to describe a [circle with any (centre) and [distance] 

P4 It is true that all right angles are equal to one another. 


P5 It is true that, if a straight line falling on two straight lines make the interior angles on 
the same side less than two right angles, the two straight lines, if produced indefinitely, 
fmeet|on that side on which are the angles less than the two right angles. 


The classical view point is explored in more detail [herel 


A great lesson learned by mathematics in the last 150 years is that it is useful to strip the 
meaning away from the mathematical assertions (axioms, postulates, theorems) 
and definitions. This abstraction, one might even say formalization, makes mathematical 
knowledge more general, capable of multiple different meanings, and therefore useful in 
multiple contexts. 


In structuralist mathematics we go even further, and develop and axioms (like 
field] theory, [group] theory, topology, vector spaces) without any particular application in 
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mind. The distinction between an “axiom” and a “postulate” disappears. The postulates 
of Euclid are profitably motivated by saying that they lead to a great wealth of geometric 
facts. The truth of these complicated facts rests on the acceptance of the basic hypotheses. 
However by throwing out postulate 5, we get theories that have meaning in wider contexts, 
for example. We must simply be prepared to use labels like ” line” 
and ”parallel” with greater flexibility. The development of hyperbolic geometry taught 
mathematicians that postulates should be regarded as purely formal statements, and not as 
facts based on experience. 


When mathematicians employ the axioms of a field, the intentions are even more abstract. 
The propositions of field theory do not concern any one particular application; the mathe- 


matician now works in complete abstraction. There are many examples of fields} field theory 


gives correct knowledge in all contexts. 


It is not correct to say that the axioms of field theory are ” propositions that are regarded as 
true without proof.” Rather, the Field Axioms are a set of constraints. If any given system of 
addition and multiplication tolerates these constraints, then one is in a position to instantly 
know a great deal of extra about this system. There is a lot of bang for the 
formalist buck. 


Modern mathematics formalizes its foundations to such an extent that mathematical theories 
can be regarded as mathematical objects, and itself can be regarded as a branch of 
mathematics. Frege, Russell, Hilbert, and Godel are some of the key figures in 
this development. 


In the modern understanding, a set of axioms is any |collection| of formally stated assertions 
from which other formally stated assertions follow by the application of certain well-defined 
rules. In this view, logic becomes just another formal system. A set of axioms should be 
it should be impossible to derive a contradiction from the axiom. A set of axioms 
should also be non-redundant; an assertion that can be deduced from other axioms need not 
be regarded as an axiom. 


It was the early hope of modern logicians that various branches of mathematics, perhaps 
all of mathematics, could be derived from a consistent collection of basic axioms. An early 
success of the formalist program was Hilbert’s formalization of geometry, and the 
related demonstration of the consistency of those axioms. 


In a wider context, there was an attempt to basel all of mathematics on{Cantor’s setl theory. 


Here the emergence of [Russells paradox| and similar antinomies of naive raised 
the possibility that any such system could turn out to be [inconsistent] 


The formalist project suffered a decisive setback, when in 1931 Godel showed that it is 
possible, for any sufficiently large set of axioms for example) to construct 
a statement whose truth is [independent] of that set of axioms. As a corollary, Gödel proved 
that the consistency of a theory like [Peano arithmetic] is an unprovable assertion within the 
scope of that theory. 
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It is reasonable to believe in the consistency of Peano arithmetic because it is satisfied by 
the system of an infinite but intuitively accessible formal system. How- 
ever, at this date we have no way of demonstrating the consistency of modern set theory 
(Zermelo-Frankel axioms). The a key hypothesis of this theory, remains a 
very controversial assumption. Furthermore, using techniques of [forcing] (Cohen) one can 


show that the continuum hypothesis (Cantor) is independent of the Zermelo-Frankel axioms. 


Thus, even this very general set of axioms cannot be regarded as the definitive foundation 
for mathematics. 
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18.6 compactness 
A logic is said to be (K, A)-compact, if the following holds 


If ® is a set of sentences] of [cardinality] less than or equal to « and all \subsets! of 
® of cardinality less than are consistent) then ® is consistent. 


For example, is (w,w)-compact, for if all subsets of some (class) of 


sentences are consistent, so is the class itself. 
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18.7 consistent 


If T is a theory) of L then it is consistent iff there is some [model M of £ such that ME T. 


If a theory is not consistent then it is inconsistent. 


A slightly different definition is sometimes used, that T is consistent iff T 7 L (that is, as 
long as it does not prove a contradiction). As long as the proof calculus used is [sound] and 


these two definitions are 
Version: 3 Owner: Henry Author(s): Henry 
18.8 interpolation property 


A llogic|is said to have the interpolation property if whenever ¢(R, S) — Y(R, S) holds, then 
there is a|lsentence] 0(R), so that ¢(R,S) — @(R) and 6(R) > Y(R, T), where R, S and T 
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are some sets of symbols that occur in the formulae, S being the set of symbols common to 
both @ and w. 


The interpolation property holds for |first order logic) The interpolation property is related 
to!Beth definability property|and Robinson’s consistency {property| Also, a/natural] general- 


isation is the concept A-closed logic. 


Version: 2 Owner: Aatu Author(s): Aatu 


18.9 sentence 


A sentence is a with no |free variables 
Simple examples include: 


Vadyl|ax < yl 


or 


dz|z+ 7-43 =0| 


However the following formula is not a sentence: 


r+2=3 
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Chapter 19 


03Bxx — General logic 


19.1 Banach-Tarski paradox 


The 3-dimensional can be split in a finite| number of pieces which can be pasted together 
to give two balls of the same volume as the first! 


Let us formulate the theorem formally. We say that a set A C R” is decomposable 
in N pieces Aj,..., Ay if there exist some 0,,...,9n of R” such that A = 
O(A) U...U6@n(Aw) while 6,(A1),..., (An) are all [disjoint] 


We then say that two sets A, B C R” are equi-decomposable if both A and B are decom- 
posable in the same pieces A,,..., Ay. 


Theorem 2 (Banach-Tarski). Thelunit ball B® C R? is equi-decomposable to the\union of 
two disjoint unit balls. 


19.1.1 Comments 


The actual number of pieces needed for this is not so large. Say that ten 


pieces are enough. 


Also it is not important that the set considered is a ball. Every two set with non empty 
are equidecomposable in R3. Also the ambient space can be choosen larger. The 
theorem is true in all R” with n > 3 but it is not true in R? nor in R. 


Where is the paradox? We are saying that a piece of (say) gold can be and pasted to 
obtain two pieces equal to the previous one. And we may (divide these two pieces in the same 
way to obtain four pieces and so on... 


We believe that this is not possible since the weight of the piece of gold does not change 
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when I cut it. 


A consequence of this theorem is, in fact, that it is not possible to define the volume for all 
lsubsets| of the 3-dimensional space. In particular the volume cannot be computed for some 
of the pieces in which the unit ball is decomposed (some of them are not measurable). 


The existence of non-measurable sets is proved more simply and in all/dimension|by 


However Banach-Tarski paradox says something more. It says that it is not possible to define 


a {measure on all the subsets of R? even if we drop the [countable additivity and replace it 
with a finite 


u(A| JB) = p(A)+ p(B) VA, B disjoint. 


Another point to be noticed is that the proof needs the So some of the 


pieces in which the ball is divided are not constructable. 
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Chapter 20 


03C05 — Equational classes, universal 
algebra 


20.1 congruence 


Let X be a and A a for ©. A congruence ~ on A is an 
such that for every n and n-ary symbol F 


of ©, if a; ~ a; then F“(a,,...an) ~ F4(a},...a/,). 
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20.2 every congruence is the kernel of a homomor- 
phism 


Let X be a lfixedl[signature] and A alstructurelfor X. If ~ is a\congruencejon A, then there 
is a homomorphism f such that ~ = ker (f). 


D efine a homomorphism f : A —> A/~ : a> fa]. Observe that a ~ b if and only if 
f(a) = f(b), so ~ = ker (f). To verify that f is a homomorphism, observe that 


1. For each [constant] symbol c of £, f(c^) = [e^] = 4^. 


2. For every {natural number] n and n-ary [relation] symbol R of X, if R4(a1,..., an) then 
RAM ([ay],...; [@n]), so R4~(f(a1),...,f (an). 


168 


3. For every natural number n and n-ary function|symbol F of X, 


FPA tices) = [FA (a, ...An)] 
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20.3 homomorphic image of a }-structure is a )i-structure 


Let X be a [fixed] [signature] and A and 8 two [structures] for ©. If f: A — Bisa 
homomorphism} then i(f) is a structural for X. 
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20.4 kernel 


Given a [function] f : A > B, the kernel of f is the equivalence relationļon A defined by 
(a, a’) € ker (f) = f(a) = f(a’). 
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20.5 kernel of a homomorphism is a congruence 


Let X be alfixed|signature| and A and B two|structures|for X. If f: A — B is ahhomomorphism| 
then ker (f) is a/congruence}on A. 


I f F is an n-ary |function| symbol of ©, and f(a;) = f(a), then 


iC a eee at) = F? 
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20.6 quotient structure 


Let © be a|fixed|/signature| A alstructurel for X, and ~ ajcongruence}on A. The quotient 
structure of A by ~, denoted A/~, is defined as follows: 


1. The {universe of A/~ is the set {[a] | a € A}. 
2. For each [constant] symbol c of X, c^ = fe^]. 
3. For every natural number!|n and every n-ary [functionl symbol F of X, 


FA ([ay],.--[an]) = [FA(a, ... an)]). 


4. For every natural number n and every n-ary\relation|symbol R of 5, RA^ (Jai, .--, [an]) 


if and only if for some a’. ~ a; we have R4*(a),,...,a/,). 
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Chapter 21 


03C07 — Basic properties of first-order 
languages and structures 


21.1 Models constructed from constants 


The definition of alstructureland of the|satisfaction relationlis nice, but it raises the following 
question : how do we get models in the first place? The most basic construction for models 
of first-order is the construction that uses Throughout this entry, L is a 


first-order 


Let C be a set of constant symbols of L, and T be a theory in L. Then we say C is a set of 
lwitnesses!| for T if and only if for every formulal y with at most one x, we have 
T+ Ax(y) => (c) for some c € C. 


lemma) Let T is any [consistent] set of sentences! of L, and C is a set of new symbols such 
that |C| = |L|. Let L’ = LUC. Then there is a consistent set T’ C L’ extending T and 
which has C as set of witnesses. 


Lemma. If T is a consistent theory in L, and C is a set of witnesses for T in L, then T has 
a model whose elements are the constants in C. 


Proof: Let X be the |signature| for L. If T is a consistent set of sentences of L, then there is 
a maximal consistent T’ D T. Note that T’ and T have the same sets of witnesses. As every 
model of T” is also a model of T, we may assume T is maximal consistent. 


We let the universe] of M be the set of equivalence classes|C/ ~, where a ~ b if and only if 
“a =b” € T. As T is maximal consistent, this is an equivalence relation| We interpret the 


non-logical symbols as follows : 


1. [a] =™ fb] if and only if a ~ b; 
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2. Constant symbols are interpreted in the obvious way, i.e. if c € X is a constant symbol, 


then c™ = fe]; 
3. If R € Nisan n-ary{relation|symbol, then ({aj], ..., [@n]) E€ R™ if and only if R(a1, ...,an) € 
T; 


4. f F € X is an n-any symbol, then F™({ao],..., [an]) = [b] if and only if 
PE Gis cy lg) =D ET. 


From the fact that T is maximal consistent, and ~ is an equivalence relation, we get that 
the operations are well-defined (it is not so ill write it out later). The proof that 
METisa straightforward induction) on the complexity of the formulas of T. © 


Corollary. (The extended completeness theorem) A set T of formulas of L is consistent if 
and only if it has a model (regardless of whether or not L has witnesses for T). 


Proof: First add a set C of new constants to L, and expand T to T’ in such a way that C 
is a set of witnesses for T’. Then expand T’ to a maximal consistent set T”. This set has a 
model M consisting of the constants in C, and M is also a model ot T. ’» 


Corollary. (compactness theorem) A set T of sentences of L has a model if and only if 
every of T has a model. 


Proof: Replace “has a model” by “is consistent”, and apply the syntactic compactness 
theorem. os 


Corollary. (Gédel’s completeness theorem) Let T be a consistent set of formulas of L. Then 
A sentence y is a theorem of T if and only if it is true in every model of T. 


Proof: If y is not a theorem of T, then ~y is consistent with T, so TU{-y} has a model 
M, in which y cannot be true. 


Corollary. (Downward Léwenheim-Skolem theorem) If T C L has a model, then it has a 
model of [power] at most |L]. 


I f T has a model, then it is consistent. The model constructed from constants has power 
at most |L| (because we must add at most |Z] many new constants). » 


Most of the treatment found in this entry can be read in more details in Chang and Keisler’s 
book Model Theory. 
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21.2 Stone space 


Suppose L is a first order language and B is a set of parameters from an L-structure M. 
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Let S,(B) be the set of (complete) n-types over B (see {type). Then we put a on 
Sn( B) in the following manner. 


For every [formula] Yy € L(B) we let $(w) := {p € S,(B): Y € p}. Then the topology is the 


one with a [basis] of [open sets| given by {S(w) : Y € L(B)}. Then we call S,,(B) endowed 
with this topology the Stone space of complete n-types over B. 


Some logical theorems and conditions are to topological conditions on this topol- 
ogy. 


e The [compactness] theorem for {first order logic! is so named because it is equivalent to 
this topology being 


e We define p to be an isolated type iff p is an lisolated| point in the stone space. This is 
equivalent to there being some formula 7 so that for every ¢ € p we have TJY @ 
i.e. all the formulas in p are implied by some formula. 


e The Morley rank of a type p € Sı(M) is equal to the Cantor-Bendixson rank of p in 
this space. 


The idea of considering the Stone space of types dates back to [I]. 


We can see that the set of formulas in a language is a A type is an ultra-filter 
on this lattice, The definition of a Stone space can be made in an analogous way on the set 
of ultra-filters on any boolean lattice. 


REFERENCES 


1. M. Morley, Categoricity in power. Trans. Amer. Math. Soc. 114 (1965), 514-538. 
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21.3 alphabet 


An alphabet ¥ is a nonempty [finitel set of symbols. The main restriction is that we must 
make sure that every |string| formed from X can be broken back down into symbols in only 


For example, {b, lo, g, bl,og} is not a valid alphabet because the string blog can be broken 
up in two ways: b lo g and bl og. {Ca, iia, d,a} is a valid alphabet, because there is only 
one way to fully break up any given string formed from it. 


If X is our alphabet and n € Zt, we define the following as the powers] of X 
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e X° = À, where à stands for the empty string 
e ©” = {ry|x € X, y € D" |} (zy is the juxtaposition of x and y) 


So, X” is the set of all strings formed from © of length n. 
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21.4 axiomatizable theory 


Let T be alfirst order theory, A/subset! A C T is a set of axioms for T if and only if T is 
the set of all consequences of the {formulas|in A. In other words, y € T if and only if ¢ is 
provable using only assumptions from A. 


Definition. A T is said to be finitely axiomatizable if and only if there is afinite] 
set of axioms for T; it is said to be recursively axiomatizable if and only if it has a 
recursive! set of axioms. 


For example, jgroup)theory is finitely axiomatizable (it has only three axioms), and{Peano arithmetic! 
is recursivaly axiomatizable : there is an algorithm that can if a formula of 


the language of the natural numbers) is an axiom. 
Theorem. \complete recursively axiomatizable theories are decidable. 


As an example of the use of this theorem, consider the theory of 
of p for any number p or 0. It is complete, and the set of axioms is 


obviously recursive, and so it is decidable. 
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21.5 definable 


21.5.1 Definable sets and functions 


Definability In Model Theory 


Let £ be a first order language, Let M be an L£-structure. Denote 2,...,2, by # and 
Y1; -- -Ym by Y, and suppose 4(Z, y) is a formula] from £, and by,...,b, is some Sequence] 
from M. 


Then we write 6(M",b) to denote {@ € M” : M H ¢(a,b)}. We say that ¢(M”,b) is b- 
definable. More generally if S is some set and B C M, and there is some b from B so that 
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S is b-definable then we say that S is B-definable. 


In particular we say that a set S is -definable or zero definable iff it is the solution set of 
some formula without parameters. 


Let f be alfunction| then we say f is B-definable iff the graphjof f (i.e. {(2,y) : f(x) =y} 
is a B-definable set. 


If S is B-definable then any |automorphism| of M that [fixes] B [pointwise] fixes S setwise. 


A set or function is definable iff it is B-definable for some parameters B. 


Some authors use the [term] definable to mean what we have called -definable here. If this 
is the convention of a paper, then the term parameter definable will refer to sets that are 
definable over some parameters. 


Sometimes in it is not actually very important what language) one is using, but 
merely what the definable sets are, or what the definability relation is. 


Definability of functions in Proof Theory 
In proof theory, given a theory T in the language £, for a function f : M — M to be 
definable in the theory T, we have two conditions: 


(i) There is a formula in the language £ s.t. f is definable over the model M, as in the above 
definition; i.e., its graph is definable in the language £ over the model M, by some formula 


p(z, y). 
(ii) The theory T proves that f is indeed a function, that is T F Yz3!y.o(7z, y). 


For example: the graph of exponentiation function x” = z is definable by the language of 
the theory [Ag (a weak subsystem of PA), however the function itself is not definable in 
this theory. 
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21.6 definable type 


Let M be a first Let A and B be sets of parameters from M. Let p be 
a n-type over B. Then we say that p is an A-definable|typelliff for every [formula] 


v(Z,y) with In(z) = n, there is some formula dy(y,Z) and some parameters a from A so 
that for any b from B we have ~(Z, b) € p iff M = dw(b, a). 


Note that if p is a type over the [model] M then this condition is [equivalent] to showing that 
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{b € M: (2,6) € M} is an A-definable set. 


For p a type over B, we say p is definable if it is B-definable. 


If p is definable, we call dy the defining formula for w, and the function w +> dy a defining 
scheme for p. 
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21.7 downward Lowenheim-Skolem theorem 


Let L be a first order language, let A be an L-structure and let K C dom(A). Then there is 
an L-structure B such that K C B and |B| < Max(| K|, |L|) and B is elementarily embedded 


in A. 
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21.8 example of definable type 


Consider (Q, <) as alstructurelin a language with one binary relation, which we interpret as 
the order, This is a universal, No-categorical structure (see example of universal structure). 


The [theory] of (Q, <) has and so is [o-minimal] Thus a |type] over the 


set Q is determined by the quantifier free formulas over Q, which in turn are determined by 
the {atomic formulas) over Q. An atomic formula in one variable over B is of the form x < b 


or x > bor x = b for some b € B. Thus each 1-type over Q determines a over 
Q, and conversly a Dedekind cut determines a complete type) over Q. Let D(p) := {a E Q: 


xr >aé€ ph}. 


Thus there are two (classes) of type over Q. 


1. Ones where D(p) is of the form (—oo, a) or (—oo,a] for some a € Q. It is clear that 


these are definable] from the above discussion. 


2. Ones where D(p) has nosupremumjin Q. These are|clearly not definable by o-minimality| 
of Q. 
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21.9 example of strongly minimal 


Let Lr be the language of rings. In other words Lr has two|constant|symbols 0,1 and three 


symbols +,.,—. Let T be the Lr-theory that includes the field axioms] and 
for each n the formula! 


Vie £1,- -, Sl aA zi = 0) > DEZ = 0) 


l<i<n O0<i<n 


Which expresses that every degree) n polynomial] which is non constant has alroot) Then any 
‘modell of T is an algebraically closed] field. One can show that this is ajcomplete theory and 
has quantifier elimination] (Tarski). Thus every B-definable lsubsetlof any K / T is\definable’ 


by a quantifier free formula) in Lz(B) with one free variable y. A quantifier free formula is a 
of atomic formulas, Each of these is of the form yes by’ = 0 which 
defines a {finite set. Thus every definable subset of K is a finite or |cofinite set. Thus K and 


T are 
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21.10 first isomorphism theorem 


Let X be alfixed|signature, and A and B [structures] for X. If f: A — B is afhomomorphism| 
then A/ker (f) is bimorphic to i(f). Furthermore, if f has the additional [property] that for 


every {natural number|n and n-ary \relation| symbol R of X, 
R*(f(a1),--- f(an)) = Taff (ai) = f(a) A R (a3, -ap )]; 
then A /ker (f) Si(f). 
S ince the homomorphic image] of a X-structure is also a U-structure, we may assume that 
i(f)= 2. 


Let ~ = ker (f). Define a{bimorphism|¢: A ~—> B : [a] + f(a). To verify that ¢ is well 
defined, let a ~ a’. Then ¢([a]) = f(a) = f(a’) = ola ']). To show that ¢ is [injective] 
suppose ¢([a]) = ¢([a’]). Then f(a) = f(a’), soa~ a’. Hence [a] = [a’]. To show that ¢ is 
a homomorphism, observe that for any{constant|symbol c of © we have $([c4]) = f (c^) = c®. 
For every natural number n and n-ary relation symbol R of X, 


RA ([ai],--+5 lanl) > BAlay..-5G,) 
=> R” (Fai), T f(an)) 
=> R*(9([ai],---, 9([an])). 
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For every natural number n and n-ary function|symbol F of X, 


o(FAM([a],..-; [an])) = oC Can -- -an ))) 
= FFA di, ax catty)) 
= F®(f(a1), vied (da)) 
= F®(4([a],.--,4([an])). 


Thus ¢ is a bimorphism. 


Now suppose f has the additional property mentioned in the statement of the theorem. 
Then 


R*($([ai]),---, O([an])) = R? (F(a), ---, f(an)) 
=> Jalla; ~ a \ R (al, ..., a )] 
=> RM (fehi: 


Thus ¢ is an isomorphism 
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21.11 language 


Let X be an We then define the following using the powers of an alphabet and 
where n € Z. 


yt = UE 
n=1 


yy = Ù mr = Et Ja} 


A string is an element of X*, meaning that it is a grouping of symbols from © one after 
another. For example, abbc is a string, and cbba is a different string. UT, like b*, 
all finite! strings except that “+ does not contain the empty string À. 


A language over X is a [subset] of ©*, meaning that it is a [etl of strings made from the 
symbols in the alphabet ©. 


Take for example an alphabet © = {@, 0,63,a, A}. We can construct languages over ©, such 


as: L = {aaa, \, Ap63, 63%, AaAaA}, or { ga, paa, paaa, gaaaa,---}, or even the empty set] 
Ø. In the context of languages, Ø is called the empty language. 
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21.12 length of a string 


Suppose we have a w on \alphabet| ©. We can then the string as w = 


11L_%3°+*Ly—1Lpn, Where for all x; (1 <i < n), x E€ È (this means that each x; must be 
a “letter” from the alphabet). Then, the length of w is n. The length of a string w is 
represented as ||wl]. 


For example, if our alphabet is © = {a,b,ca} then the length of the string w = bcaab is 
||w|| = 4, since the string breaks down as follows: x; = b, £2 = ca, £3 = a, £4 = b. So, our 
Tn is x4 and therefore n = 4. Although you may think that ca is two separate symbols, our 
chosen alphabet in fact classifies it as a single symbol. 


A "special case” occurs when ||w|| = 0, i.e. it does not have any symbols in it. This string 
is called the empty string. Instead of saying w = , we use A to represent the empty string: 
w = à. This is similar to the practice of using 9 to represent a space, even though a space 
is really blank. 


If your alphabet À as a symbol, then you must use something else to denote the 
empty string. 


Suppose you also have a string v on the same alphabet as w. We turn w into z1- £n just 
as before, and similarly v = y,---Yym. We say v is equal to w if and only if both m = n, 
and for every i, £i = Yj. 


For example, suppose w = bba and v = bab, both strings on alphabet © = {a,b}. These 
strings are not equal because the second symbols do not match. 
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21.13 proof of homomorphic image of a }-structure is 
a -structure 


We need to show that i(f) is under For every symbol c of È, 
c? = f(c”). Hence c? € i(f). Also, if b1,...,bn € i(f) and F is an n-ary function symbol of 
X, then for some a1,...,a@n E A we have 


F?(bi,... bn) = F?(f(a1),..., f(an)) = f(FA(u,...,4n)). 
Hence F'?(by,...,0n) € i(f). 
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21.14 satisfaction relation 


Alfred Tarski was the first mathematician to give a definition of what it means for a formula) 
to be “true” in a/structure, To do this, we need to provide a meaning to [terms] and truth- 
values to the formulas. In doing this, free variables cause a problem : what value are they 
going to have ? One possible answer is to supply temporary values for the free variables, 
and define our notions in terms of these temporary values. 


Let A be a structure for the|signature| ©. Suppose J is an|interpretation| and ø is a (function! 


that assigns elements of A to variables, we define the function Val;,, inductively on the 
construction of terms : 


Valic(c) = I(c) c a constant symbol 
Valio(z) = o(2) x a variable 
Valrol(f (ti, tn)) = I(f)(Valro(tı), .--, Valro(tn)) f an n-ary function symbol 


Now we are set to define satisfaction. Again we have to take care of free variables by assigning 
temporary values to them via a function ø. We define the [relation] A, o = ọ by 
on the construction of formulas : 


A,o H| tı =t if and only if Valro(t1) = Val,o(t2) 
A,o  R(t,...,tn) if and only if (Valj,.(t,), ..., Valro(t1)) € I(R) 


A,o — -wifandonlyif Aoky 
A,o = yVwifand only if either A,o = Y or A, o = Y 
A,o = dz.p(z) if and only if for some a€ A, A,co|x/a] Ey 


Here 

a if 2 y 

o|x/al(y) 

oly) else. 
In case for some y of L, we have A,o — ọ, we say that A or is a model of, 
or y in environment, or context sigma. If y has the free variables £1, ..., En, 
and a1, ...,an E A, we also write A = (aj,...,a,) or A H (a,/21,...,@n/n) instead of 
A, o|%1/a1] +--+ [£n/an] H vy. In case y is alsentence| (formula with no free variables), we write 
A Eg. 
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21.15 signature 


A signature is the of a set of [constant] symbols, and for every n, 
a set of n-ary relation| symbols and a set of n-ary function] symbols. 
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21.16 strongly minimal 


Let L be a (first order language] and let M be an L-structure. Let S, a subset] of the [domain] 
of M be a definable linfinite set! Then S is strongly minimal iff every definable C C S we 
have either C is finitel or S \ C is finite. We say that M is strongly minimal iff the domain 
of M is a strongly minimal set. 


If M is strongly minimal and N & M then N is strongly minimal. Thus if T is a [complete] 
L |theory] then we say T is strongly minimal if it has some (equivalently all models) 
which is strongly minimal. 


Note that M is strongly minimal iff every definable subset of M is [quantifier free] definable 
ina with just equality. Compare this to the notion of 
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21.17 structure preserving mappings 


Let X be a fixed|signature, and A and B be two lstructureslfor ©. The interesting [functions] 
from A to B are the ones that preserve the [structurel 


A function f: A — P is said to be a homomorphism if and only if: 


1. For every [konstant] symbol c of ©, f(c*) = c®. 

2. For every natural number!|n and every n-ary function symbol F of X, 
F(F (ar, ---, an)) = FP(f (a1), +) f (an). 

3. For every natural number n and every n-ary [relation|symbol R of 5, 


R” (a1,..., an) > (FG) axe sf Og) 


Homomorphisms with various additional have special mames} 
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An injective homomorphism is called a monomorphism. 


A surjective] homomorphism is called an epimorphism. 


A homomorphism is called a bimorphism. 


An injective homomorphism whose linverse function is also a homomorphism is called 
an embedding. 


A surjective embedding is called an isomorphism. 


e A homomorphism from a structure to itself (e.g., f: A — A) is called an endomor- 
phism. 


e An isomorphism from a structure to itself is called an automorphism. 
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21.18 structures 


Suppose © is a and £ is the corresponding first-order A > 
structure A consists of a set A, called the {universel of A, together with an interpretation 


for the non-logical symbols contained in X. The interpretation of © in A is an operation 


J on sets that has the following 


1. For each [constant] symbol c, J(c) is an element of A. 


2. For each n € N, and each n-ary \function| symbol f, J(f) : A" — A is a function from 
A” to A. 


3. For each n € N, and each n-ary symbol R, d(R) is a of (n-ary 
A”. 


Another commonly used notation is J(c) = c^, J(R) = R^, d(f) = f^. For notational 
convenience, when the context makes it clear in which structure we are working, we use the 
elements of X to stand for both the symbols and their interpretation. When © is understood, 
we call A a structure, instead of a X-structure. In some texts, model may be used for 
structure. Also, we shall write a € A instead of a € A. Of course, there are many different 
possibilities for the interpretation J. If A is a structure, then the power] of A, which we 
denote |A|, is the cardinality of its universe A. It is easy to see that the number of possibilities 
for the interpretation J is at most 2A when A is 
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21.19 substructure 


Let © be a fixed|signature, and A and B|structures| for ©. We say A is a substructure of 
B, denoted A C 8, if for all x € A we have x € B, and the|inclusion map|i: A > B : rhea 


is an embedding 
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21.20 type 


Let L be a Let M be an L-structure. Let B C M, and let a € M”. 


Then we define the type of a over B to be the set of L-formulas ọ(x,b) with parameters b 


from B so that M = ¢(a, b). A \collection| of L-formulas is a [complete] n-type over B lifflit is 
of the above form for some B, M and a € M”. 


We call any collection of p in n variables with parameters from B a 
partial n-type over B. (See criterion for consistency of sets of formulas.) 

Note that a complete n-type p over B is consistent so is in particular a partial type over 
B. Also p is in the sense that for every formula Y(x,b) over B we have either 


w(a,b) E€ p or ~y(z,b) € p. In fact, for every collection of formulas p in n variables 
the following are equivalent 


e p is the type of some Sequence] of n elements a over B in some [model] N = M 


e pis a maximal consistent set of formulas. 
For n € w we define S,,(B) to be the set of complete n-types over B. 


Some authors define a collection of formulas p to be a n-type iff p is a partial n-type. Others 
define p to be a type iff p is a complete n-type. 


A type (resp. partial type/complete type) is any n-type (resp. partial type/complete type) 
for some n € w. 
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21.21 upward Lowenheim-Skolem theorem 


Let L be a first-order [language] and let A be an [infinite] L-structure. Then if « is a 
with x > Max(|A],|Z|) then there is an L-structure B such that A is elementarily embedded 
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in B. 
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Chapter 22 


03C15 — Denumerable structures 


22.1 random graph (infinite) 


Suppose we have some method M of generating sequences) of letters from {p,q} so that at 
each generation the probability of obtaining p is x, a real number| strictly between 0 and 1. 


Let {a; : i < w} be a set of|vertices| For each i < w , i > 1 we construct a/graph|G; on the 


vertices a,,..., a; recursively. 


e G; is the unique graph on one |vertex| 
e For i > 1 we must describe for any j < k <i when a; and a, are joined. 


— If k < i then [join] aj and a, in G; iff a; and a, are joined in Gi- 
— If k = i then generate a letter [(j,k) with M. Join a; to a, iff l(j,k) = p. 


Now let [ be the graph on {a; : i < w} so that for any n,m < w, an is joined to am in T iff 
it is in some G;. 


Then we call [ a random graph. Consider the following property| which we shall call f- 
saturation: 


Given any |finite|/disjoint| U and V, |subsets| of {a; : i < w} there is some a, € {a; : i < 
w}\ (ULV) so that a, is joined to every point of U and no points in V. 


Proposition 1. A random graph has f-saturation with probability 1. 


Proof: Let bı, bo,...,0n,... be an enumeration of {a; :i<w}\(UUV). We say that b; is 
correctly joined to (U, V) iff it is joined to all the members of U and non of the members of 
V. Then the probability that b; is not correctly joined is (1 — 2!¥!(1 — x)!”!) which is some 
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real number y strictly between 0 and 1. The probability that none of the first m are correctly 
joined is y” and the probability that none of the b;s are correctly joined is lim,_..,.y” = 0. 
Thus one of the b;s is correctly joined. 


Proposition 2. Any two|countable graphs with f-saturation are \isomorphic, 


Proof: This is via a back and fourth argument. The property of fsaturation is exactly what 
is needed. 


Thus although the system of generation of a random graph looked as though it could deliver 
many potentially different graphs, this is not the case. Thus we talk about the random 
graph. 


The random graph can also be constructed as a Fraisse|limit| of all finite graphs, and in many 
other ways. It is|homogeneous| and [universal] for the (class| of all countable graphs. 


The theorem that almost every two graph random are isomorphic was first proved 


in (YJ. 


REFERENCES 
1. Paul Erdés and Alféd Rényi. Assymetric graphs. Acta Math. Acad. Sci. Hung., 14:295-315, 
1963. 


Version: 2 Owner: bbukh Author(s): bbukh, Timmy 


186 


Chapter 23 


03C35 — Categoricity and 
completeness of theories 


23.1 «-categorical 


Let L be a first order language and let S be a set of L-sentences. If K is a cardinal then S 
is said to be k-categorical if 5 has a|model of cardinality) « and any two such models are 


In other words, S is categorical iffit has a unique model of cardinality «x, to within isomorphism| 
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23.2 Vaught’s test 


Let L be a first order language, and let S be a set of Z-sentences with no which 
is K-categorical for some « > |L|. Then S' is complete] 
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23.3 proof of Vaught’s test 


Let y be an L-sentence, and let A be the unique|model]of S ATTAN a Suppose A E y. 
Then if B is any model of S then by the upward|and do 
there is a model € of S which is |elementarily equivalent] to B such that le] = = K. Then € is 
isomorphic] to A, and so CF y, and BF y. So BF ¢ for all models B of S, so S Fg. 
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Similarly, if A E ~y then S E ~y. So S is complete. 
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Chapter 24 


03C50 — Models with special 
properties (saturated, rigid, etc.) 


24.1 example of universal structure 


Let L be the first order language with the binary relation <. Consider the following|sentences} 


e Vr, y((z <SyYz Sy)A (lz Sy Ay Sz) z= y)) 


e Yr, y,z(z Sy ^y Sz 2 <2) 


Any L-structure satisfying these is called a linear order. We define the < so that 
x < ylifflz <y ^g 4 y. Now consider these sentences: 


1. Vz, y((x < y > 3z(x < z < y)) 


2. YVrJy, z(y < x< z) 


A linear order that 1. is called dense. We say that a linear order that satisfies 2. is 
without endpoints. Let T be the [theory] of dense linear orders without endpoints. This is a 


complete theory 


We can see that (Q, <) is almodellof T. It is actually a rather special model. 


Theorem 3. Let (S, <) be any [finite] linear order. Then S embeds in (Q, <). 


Proof: By induction] on |S], it is trivial for |S] = 1. 
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Suppose that the statement holds for all linear orders with [cardinality] less than or equal to 
n. Let |S| = n + 1, then pick some a € S, let S” be the |structure induced] by S on S \ a. 


Then there is some embedding |e of S’ into Q. 


e Now suppose a is less than every member of 5S’, then as Q is without endpoints, there 
is some element b less than every element in the jimage| of e. Thus we can extend e to 
mapia to b which is an embedding of S into Q. 


e We work similarly if a is greater than every element in S”. 


e If neither of the above hold then we can pick some maximum cı € S so that c < a. 
Similarly we can pick some minimum c2 € S’ so that co < a. Now there is some b € Q 
with e(c,) < b < e(cg). Then extending e by a to b is the required embedding. 


It is easy to extend the above result to\countable structures} One views a countable structure 
as a the of an increasing of finite The necessary embedding is 
the union of the embeddings of the substructures. Thus (Q, <) is countable linear 
order. 


Theorem 4. (Q, <) is|homogeneoud 


Proof: The following |type) of proof is known as a back and forth argument. Let Sı and Sz be 


two finite substructures of (Q, <). Let e : S; — S2 be an [isomorphism] It is easier to think 
of two [disjoint] copies B and C of Q with Sı a substructure of B and S> a substructure of C. 


Let 6), bo,... be an enumeration of B \ S4. Let c1, C2,...,Cn be an enumeration of C \ So. 
We literate] the following two step process: 


The ith forth step If b; is already in the domain] of e then do nothing. If b; is not in the 
domain of e. Then as in [proposition|3] either b; is less than every element in the domain of 
e or greater than or it has an immediate [successor] and predecessor in the \rangejof e. Either 
way there is an element c in C\ range(e) relative to the range of e. Thus we can extend the 
isomorphism to include };. 


The ith back step If c; is already in the range of e then do nothing. If c; is not in the 
domain of e. Then exactly as above we can find some b € B\ dom(e) and extend e so that 
e(b) = Ci. 


After w stages, we have an isomorphism whose range includes every b; and whose domain 
includes every c;. Thus we have an isomorphism from B to C extending e. 


A similar back and forth argument shows that any countable dense linear order wihtout 
endpoints is to (Q, <) so T is No-categorical. 
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24.2 homogeneous 


Let L be a first order language, Let M be an L-structure. Then we say M is homogeneous 
if the following holds: 


ifø is an|isomorphism between finite substructures of M, then o extends to an automorphism 
of M. 
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24.3 universal structure 


Let L be alfirst order language\ and let R be an elementary class! of L-structures. Let « be 
aicardinal| R, be the set of |structures! from R with [cardinality] less than or equal to k. 


Let M € Rẹ. Suppose that for every N € R, there is an embedding of N into M. Then we 
say M is universal. 
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Chapter 25 


03C52 — Properties of classes of 
models 


25.1 amalgamation property 


A\class| of L-structures S has the amalgamation property iff whenever A, B1, B2 € S and and 
fi: A — B; are elementary embeddings) for i € {1,2} then there is some C € S and some 


elementary embeddings g; : B; — C for i € {1,2} so that gi(fi(%)) = go(fo(x)) for all x € A. 


Compare this with the free product with amalgamated subgroup) for {groups} and the defini- 


tion of pushout contained there. 
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Chapter 26 


03C64 — Model theory of ordered 
structures; 0-minimality 


26.1 = infinitesimal 


Let R be alreal closed field| for example the|reals| thought of as a/structurelin L, the Jlanguage| 
of Let B be some set of parameters from R. Consider the following set of 
in L(B): 

{r <b:bE BAb>O} 


Then this set of formulas is finitely satisfied, so by [compactness] is In fact this 
set of formulas extends to a unique p over B, as it defines a Dedekind cut} Thus there 
is some [model M containing B and some a € M so that tp(a/B) = p. 


Any such element will be called B-infinitesimal. In particular, suppose B = Ø. Then the 


definable||closure| of B is the intersection] of the reals with the [algebraic numbers! Then a 


-infinitesimal (or simply infinitesimal ) is any element of any real closed field that is positive 


but smaller than every real [algebraic] (positive) number. 


As noted above such models exist, by compactness. One can construct them using ultra- 
products, see the entry on This is due to Abraham Robinson, who used such 
lfields| to formulate nonstandard analysis. 


Let K be any ordered ring, then K \contains|N. We say K is archemedianiiff for every a € K 
there is some n € N so that a < n. Otherwise K is non-archemedian. 


Real closed fields! with infinitesimal elements are non-archemedian: for a an infinitesimal we 
have a < 1/n and thus 1/a > n for each n € N. 


Reference: A Robinson, Selected papers of Abraham Robinson. Vol. II. Nonstandard anal- 
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ysis and philosophy (New Haven, Conn., 1979) 
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26.2 o-minimality 


Let M be an ordered |structure) An [intervallin M is any |subset) of M that can be expressed 
in one of the following forms: 


e {x:a< x < b} for some a,b from M 
e {x: x >a} for some a from M 


e {x:x <a} for some a from M 


Then we define M to be o-minimaliff every \definable|subset of M is alfinitelunion|of intervals 


and points. This is a property| of the of M i.e. if M 4 N and M is o-minimal, then 
N is o-minimal. Note that M being o-minimal is to every definable subset of M 


being definable in the [language] with just the Compare this with 


strong minimality. 


The [model] theory of o-minimal /structures] is well understood, for an excellent account see 
Lou van den Dries, Tame and o-minimal structures, CUP 1998. In particular, 
although this condition is merely on definable subsets of M it gives very good \information| 
about definable subsets of M” for n € w. 


Version: 4 Owner: Timmy Author(s): Timmy 


26.3 real closed fields 


It is clear that the for a to be an can written in L, the 


of It is also true that the following conditions can be 
written in a schema of first order|sentences]in thislanguage| For each odd 
p € K|a], p has a[root| 


Let A be all these sentences together with one that states) that all positive elements have a 


Then one can show that the consequences of A are a\complete theory T. It is 
clear that this [theory] is the theory of the{real numbers, We call any L structure a real closed 


field. 


194 


The semi algebraic sets on a real closed field are |Boolean combinations of solution sets of 


polynomial equalities and Tarski showed that T has 
which is equivalent] to the \class| of semi algebraic sets being |closed| under 


Let K be a real closed field. Consider the |definablel/subsets of K. By quantifier elimination, 
each is definable by alquantifier free formula) i.e. a boolean |combination| of|atomic formulas} 


An atomic formula in one variable has one of the following forms: 
e f(x) > g(x) for some f,g € K[z] 


e f(x) = g(x) for some f,g € K[z]. 


The first defines a union of the second defines a finite union of points. Every 
definable subset of K is a finite union of these kinds of sets, so is a finite union of intervals 


and points. Thus any real closed field is o-minimal) 
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Chapter 27 


03C68 — Other classical first-order 
model theory 


27.1 imaginaries 


Given an S to investigate, mathematicians consider re- 
strictions of the structure, {quotient structures] and the like. A natural question for a math- 
ematician to ask if he is to understand 5S is “What [structures] naturally live in 9?” We can 
formalise this question in the following manner: Given some logic] appropriate to the struc- 
ture S, we say another structure T is|definable in S iff there is some definable [subset] T” of 
S”, a|bijection| 0 : T’ — T and a definable function] (respectively on T” for each 
(resp. relation) on T so that ø is an [isomorphism] (of the relevant [type] for T). 


For an example take some infinite||group) (G,.). Consider the [entrel of G, Z := {xr E€ G: 
Vy € G(xy = yx)}. Then Z is a first [order] definable subset of G, which forms a group with 
the restriction of the multiplication, so (Z, .) is a first order definable structure in (G, .). 


As another example consider the structure (R, +, .,0, 1) as afield! Then the structure (R, <) 
is first order definable in the structure (R,+,.,0,1) as for all x,y € R? we have x < y iff 
dz(z? = y — x). Thus we know that (R,+,.,0,1) is unstable as it has a definable order on 
an infinite subset. 


Returning to the first example, Z is in G, so the set of (left) [cosets] of Z form a 
The|domain of the factor group is the quotient of G under the|equivalence relation] 
x & y iff dz € Z(xz = y). Therefore the factor group G/Z will not (in general) be a de- 
finable structure, but would seem to be a “natural” structure. We therefore weaken our 
formalisation of “natural” from definable to interpretable. Here we require that a struc- 


ture is isomorphic to some definable structure on equivalence classes of definable equivalence 


relations. The equivalence classes of a @-definable equivalence relation are called imaginaries. 
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In 2] Poizat defined the of Elimination of Imaginaries. This is equivalent) to the 


following definition: 


Definition 1. A structure 2% with at least two distinct -definable elements admits elimina- 
tion of imaginaries iff for every n € N and -definable equivalence relation ~ on 2” there is 
a -definable function f : A” — AP (for some p) such that for all x and y from 2” we have 


z~ y iff f(x) = fly). 


Given this property, we think of the function f as coding the equivalence classes of ~, 
and we call f(x) a code for x/ ~. If a structure has elimination of imaginaries then every 
interpretable structure is definable. 


In B] Shelah defined, for any structure 2 a multi-sorted structure 2°. This is done by adding 
a{sort for every -definable equivalence relation, so that the equivalence classes are elements 
(and code themselves). This is a [closure operator] i.e. 2° has elimination of imaginaries. 
See [I] chapter 4 for a good of imaginaries and 2°. The idea of passing to 
2°27 is very useful for many purposes. Unfortunately 2°7 has an unwieldy and 
theory, Also this approach does not answer the question above. We would like to show that 
our structure has elimination of imaginaries with just a small selection of sorts added, and 
perhaps in a|simple| language. This would allow us to describe the definable structures more 
easily, and as we have elimination of imaginaries this would also describe the interpretable 
structures. 


REFERENCES 


1. Wilfrid Hodges, A shorter model theory Cambridge University Press, 1997. 

2. Bruno Poizat, Une théorie de Galois imaginaire, Journal of Symbolic Logic, 48 (1983), pp. 
1151-1170. 
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Chapter 28 


03C90 — Nonclassical models 
(Boolean-valued, sheaf, etc.) 


28.1 Boolean valued model 


A traditional of a makes every of that language either true or 


false. A Boolean valued model is a generalization in which formulas take on any value in a 


Boolean algebra 


Specifically, a Boolean valued model of a/signature| £ over the language £ is a set A together 
with a Boolean algebra B. Then the objects of the model are the functions) A? = B —> A. 


For any formula ¢, we can assign a value ||¢ġ|| from the Boolean algebra. For example, if £ is 


the language of first order logic| a typical definition of ||@|| might look something 
like this: 


e lf = gll = Vew=ow 2 
e |||] = loll’ 


e lev ell = lloll v Tell 
(2) || = V peas OC) 
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Chapter 29 


03C99 — Miscellaneous 


29.1 axiom of foundation 


The axiom of foundation (also called the axiom of regularity) is an of 
prohibiting circular sets and sets with of containment. Intuitively, 
it [states] that every set can be built up from the There are several 


formulations, for instance: 
For any nonempty set X there is some y € X such that yN X = 0. 


For any set X, there is no f from w to the of X such that 
f(n+1) € f(n). 


For any formula) ¢, if there is any set x such that (x) then there is some X such that 6(X) 
but there is no y € X such that (y). 


Version: 2 Owner: Henry Author(s): Henry 
29.2 elementarily equivalent 


If M and N are|models of £ then they are elementarily equivalent, denoted M © N iff 
for every Q: 


ME diffN E ¢ 
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29.3 elementary embedding 


If A and R are of £ such that for each t € T, A; C B, then we say B is an 
elementary extension of A, or, equivalently, A is an elementary substructure of P if, 
whenever @ is a formula\ of £ with free variables] included in 21,...,2n (of [types] t1,..., tn) 


and a,,...,@, are such that a; € t; for each i < n then: 


AF $(a1,...,Gn)iffB E O(a1,..., an) 


If A and B are models of £ then a collection!) of fi : A: — B, for each 
t € T is an elementary embedding of A if whenever ¢ is a formula of type À with free 
variables included in 21,...,2, (of types t),...,tn) and a1,...,@, are such that a; € t; for 
each 7 < n then: 


AF (a1,..-,@n)iffB E O( ft, (a1), ---5 ftn (Qn)) 
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29.4 model 


Let L bea with function] symbols F, \relations| R, and |types| T. Then 
M = ({M: |t ETKI" f € F},{r*| r € R}) 
is a model of L (also called an L-structure, or, if the underlying {logic|is clear, a U-structure, 
where © is a|signature| specifying just F and R) if: 
e Whenever f is an n-ary function symbol such that Type(f) = t and Inputs, (f) = 
(t1,...,tn) then f™: TT Mi, > M: 
e Whenever r is an n-ary {relation symbol such that Inputs,,(r) = (ti, ..-, tn} then r™ is 
a ficlation onl [[" Ne 
If s is aterm) of L of type t, without [free variables] then it follows that s = fs,...s, and 
ee fM(sM,..., 82 E Ma. 
If @ is alsentence] then we write MF ¢ (and say that M satisfies ¢) if ¢ is true in M, where 
truth of a relation is defined by: 


o Rti...tn is true if AG tS) 
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e truth of a non-atomic {formulalis defined using the |semantics| of the underlying logic. 


If ® is alclass| of sentences, we write MF © if for every ọ € ®, ME ¢. 


For any term s of L whose only free variables are included in 2j,..., 2, with types t;,...,tn 
then for any a1,...,@, such that a; E€ M, define s™(a,,...,an) by: 


e If s; = x; then sM(a,,..., an) = a; 
o If =f sic he then BOG ..., an) = f (sM(a1,..., an), ---, Gian ide) 
If @ is a formula whose only free variables are included in 71,...,2%, with types t),...,tn 


then for any a,,...,@, such that a; E M;, define MF ¢(a,,...,a,) recursively by: 


o If ọ = Rs,...8m then ME (a1, ..., an) IÆ R™(s?"(a1,...,an),...,92(a1,...,n)) 


e Otherwise the truth of ¢ is determined by the semantics of the underlying logic. 


As above, M E ®(a1,...,a@,) iff for every 6 € ©, ME (aj,..., an). 
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29.5 proof equivalence of formulation of foundation 


We show that each of the three formulations of the|axiom of foundation| given are equivalent) 


L==2 


Let X be a set and consider any [function] f : w — tc(X). Consider Y = {f(n) | n < w}. By 
assumption, there is some f(n) € Y such that f(n) QY =, hence f(n +1) ¢ f(n). 


2=3 


Let @ be some |formulalsuch that ¢(x) is true and for every X such that 6(X), there is some 
y € X such that ¢(X). The define f(0) = x and f(n +1) is some n € f(n) such that ¢(z). 
This would construct a function violating the assumption, so there is no such ø. 
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3=>1 


Let X be a nonempty set and define ¢(z) = x € X. Then ¢ is true for some X, and by 
assumption, there is some y such that ¢(y) but there is no z € y such that ¢(z). Hence 
y E€ X but yf) X =f. 
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Chapter 30 


03D10 — Turing machines and related 
notions 


30.1 Turing machine 


A Turing machine is an|limaginary computing machine invented by Alan Turing to describe 
what it means to compute something. 


The ” physical description” of a Turing machine is a box with a tape and a tape head. The 
tape consists of an infinite) number of stretching in both directions, with the tape head 
always located over exactly one of these cells. Each cell has one of alfinite number of symbols 
written on it. 


The machine has a finite set of and with every move the machine can change states, 
change the symbol written on the current cell, and move one space left or right. The machine 
has a program which specifies each move based on the current state and the symbol under 
the current cell. The machine stops when it reaches a of state and symbol 
for which no move is defined. One state is the start state, which the machine is in at the 
beginning of a computation. 


A Turing machine may be viewed as computing either a|partial function) or alrelation| When 
viewed as a function| the tape begins with a set of symbols which are the input, and when 
the machine halts, whatever is on the tape is the output. For instance it is not difficult to 
write a program which doubles a binary number, so input of 10 (with 0 on the first cell, 1 
on the second, and all the rest blank) would give output 100. If the machine does not halt 
on a particular input then the function is undefined on that input. 


Alternatively, a Turing machine may be viewed as computing a relation. In that case the 
initial symbols on the tape is again an input, and some states are denoted ”accepting.” If 
the machine halts in an accepting state, the symbol is accepted, if it halts in any other state, 
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the symbol is rejected. A slight variation is when all states are accepting, and a symbol 
is rejected if the machine never halts (of course, if the only method of determining if the 
machine will halt is watching it then you can never be sure that it won’t stop at some point 
in the future). 


Another way for a Turing machine to compute a relation is to list its members 
one by one. A relation is recursively enumerable| if there is some Turing machine which can 
list it in this way, or equivalently if there is a machine which halts in an accepting state only 
on the members of the relation. A relation is recursive) if it is recursively enumerable and its 
[complement] is also. An [equivalent] definition is that there is a Turing machine which halts 


in an accepting state only on members of the relation and always halts. 


There are many variations on the definition of a Turing machine. The tape could be infinite 
in only one direction, having a first cell but no last cell. Even stricter, a tape could move in 
only one direction. It could be two (or more) dimensional. There could be multiple tapes, 
and some of them could be read only. The cells could have multiple tracks, so that they hold 
multiple symbols simultaneously. 


The programs mentioned above define only one move for each possible state and symbol 
combination; these are called deterministic. Some programs define multiple moves for some 


combinations 


If the machine halts whenever there is any |series| of legal moves which leads to a situation 
without moves, the machine is called non-deterministic. The notion is that the machine 
guesses which move to use whenever there are multiple choices, and always guesses right. 


Yet other machines are probabilistic; when given the choice between different moves they 
select one at random. 


No matter which of these variations is used, the recursive and recursively enumerable|relations| 
and functions are unchanged (with two exception—one of the tapes has to move in two di- 
rections, although it need not be infinite in both directions, and there can only be a finite 
number of symbols, states, and tapes): the simplest imagineable machine, with a single tape, 
jone-way) infinite tape and only two symbols, is equivalent to the most elaborate imagineable 
array of multidimensional tapes, lucky guesses, and fancy symbols. 


However not all these machines can compute at the same speed; the speed-up theorem states 
that the number of moves it takes a machine to halt can be divided by an arbitrary [constant] 
(the basic method involves increasing the number of symbols so that each cell encodes several 
cells from the original machine; each move of the new machine emulates several moves from 
the old one). 


In particular, the question P = NP, which asks whether an important determinisitic 
machines (those which have a polynomial] function of the input length bounding the time it 
takes them to halt) is the same as the corresponding class of non-deterministic machines, is 
one of the major unsolved problems in modern mathematics. 
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Chapter 31 


03D20 — Recursive functions and 
relations, subrecursive hierarchies 


31.1 primitive recursive 


Thelclass|of primitive recursive functions is the smallest class oflfunctions on the\naturals) 
(from N to N) that 
1. Includes 


e the zero function: z(x) = 0 
e the Successor) function: s(x) =£ +1 
e the functions: Pn,m(£1,---, En) =e MIN 
2. Is|closed| under 
e composition: h(zx1,..., €n) =f (Gil Wigs sgn oss Oral Pigs --;, En)) 


e primitive recursion: h(x,0) = f(x); h(a,y +1) = g(x,y, h(a, y)) 


The primitive recursive functions are /Turing-computable| but not all Turing-computable 
functions are primitive recursive (see |Ackermann’s function). 


Further Reading 


e “Dave’s Homepage: Primitive Recursive Functions” : http: //www.its.caltech.edu/ boozer/symbols/pı 
e “Primitive recursive functions”: http://public.logica.com/ stepneys/cyc/p/primrec.htm 
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Chapter 32 


03D25 — Recursively (computably) 
enumerable sets and degrees 


32.1 recursively enumerable 


For a language) L, ‘TFAE 


e There exists a Turing machine f such that Vx.(a € L) = the computation f(x) terminates. 
e There exists a total {recursivellfunction| f : N — L which is onto! 
e There exists a total recursive function f : N — L which is\one-to-one] and onto. 


A language L fulfilling any (and therefore all) of the above conditions is called recursively 
enumerable. 


Examples 


1. Any recursive language. 
2. The set of encodings of Turing machines which halt when given no input. 
3. The set of encodings of theorems of 


4. The set of integers|n for which the hailstone Sequence] starting at n reaches 1. (We 
don’t know if this set 1s recursive, or even if it is N; but a trivial program shows it is 
recursively enumerable. ) 
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Chapter 33 


03D75 — Abstract and axiomatic 
computability and recursion theory 


33.1 Ackermann function 


Ackermann’s function A(x, y) is defined by the [recurrence relations| 


A(0, y) =y+1 
A(x+1,0) = A(z,1) 
A(x +1,y+1) =A(a, A(x +1,y)) 


Ackermann’s function is an example of a/recursive) function| that is not 
but is instead p-recursive (that is, Turing-computable). 


Ackermann’s function grows extremely fast. In fact, we find that 


A(0,y) = y+1 

A(l,y) = 2+(y+3)-3 

A(2,y) = 2-(y+3)-3 

A(3,y) = 24-3 

A(4,y) = 22? —3 (y + 3exponentiations) 


.. and at this point conventional notation breaks down, and we need to employ something 
like Conway notation or Knuth notation for large numbers. 
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Ackermann’s function wasn’t actually written in this form by its namesake, Wilhelm Acker- 
mann. Instead, Ackermann found that the z-fold exponentiation of x with y was an example 
of a recursive function which was not primitive recursive. Later this was simplified by Rosza 
Peter to a function of two variables, similar to the one given above. 
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33.2 halting problem 


The halting problem is to determine, given a particular input to a particular computer 
program, whether the program will terminate after a finite number of steps. 


The consequences of a solution to the halting problem are far-reaching. Consider some 
predicate P(x) regarding natural numbers; suppose we conjecture that P(x) holds for all 
x € N. (Goldbach’s conjecture, for example, takes this form.) We can write a program 
that will count up through the natural numbers and terminate upon finding some n such 
that P(n) is false; if the conjecture holds in general, then our program will never terminate. 
Then, without running the program, we could pass it along to a halting program to 
prove or disprove the conjecture. 


In 1936, Alan Turing proved that the halting problem is undecideable; the argument is 
presented here informally. Consider a hypothetical program that decides the halting the 
problem: 


Algorithm HALT(P, I) 
Input: A computer program P and some input J for P 
Output: True if P halts on J and false otherwise 


The implementation of the algorithm, as it turns out, is irrelevant. Now consider another 
program: 


Algorithm BREAK (x) 
Input: An irrelevant parameter x 
Output: begin 
if Halt(Break,x) then 
whiletrue do 


nothing 
else 


Break — true 
end 


In other words, we can design a program that will break any solution to the halting problem. 
If our halting solution determines that Break halts, then it will immediately enter an|infinite 
otherwise, Break will return immediately. We must conclude that the Halt program 
does not |decide) the halting problem. 
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Chapter 34 


03E04 — Ordered sets and their 
cofinalities; pcf theory 


34.1 another definition of cofinality 


Let « be a [limit ordinal] (e.g. a\cardinal). The cofinality of x cf(K) could also be defined as: 
cf(k) = inf{|U| : U Cxs.t. sup U = k} 


(sup U is calculated using the of the ordinals). The cofinality of a cardinal is 
always a and hence cf(K) = cf(cf(«)). 


This definition is to the parent definition. 


Version: 5 Owner: x_bas Author(s): x_bas 


34.2 cofinality 


If a is an ordinal and X C a then X is said to be cofinal in a if whenever y € a there is 
x E€ X withy <z. 


A map] f: a — 2 between ordinals a and £ is said to be cofinal if the |image] of f is cofinal 
in 8. 


If @ is an ordinal, the cofinality cf(8) of 8 is the least ordinal a such that there is a cofinal 
map f: a — b. Note that cf(8) < 8, because the [identity map| on b is cofinal. 


It is not hard to show that the cofinality of any ordinal is a\cardinal] in fact a regular cardinal: 
a cardinal « is said to be regular if cf(«) = « and singular if cf(K) < k. 
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For any linfinite| cardinal « it can be shown that k < Kf), and so also & < cf(2"). 
Examples 


0 and 1 are regular cardinals. All other cardinals have cofinality 1 and are therefore 
singular. 


No is regular. 
Any infinite successor cardinal] is regular. 


The smallest infinite singular cardinal is Xu. In fact, the map f: w —> No given by f(n) = wy, 
is cofinal, so cf(Nu) = No. Note that cf(2%°) > No, and consequently 2% Æ Nu. 
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34.3 maximal element 
Let < be an [ordering] on a set S, and let A C S. Then, with respect to the ordering <, 


e a € Ais the least element of A if a < x, for all x € A. 
e a E€ Aisa minimal element of A if there exists no x € A such that z < a and x £a. 
e a € Ais the greatest element of A if x < a for all x € A. 


e a € A is almaximalleclement of A if there exists no x € A such that a < x and x Fä: 


Examples. 


e The|natural numbers| N ordered by (|) have a least element, 1. The natural 


numbers greater than 1 (N \ 1) have no least element, but infinitely many minimal 
elements (the {primes|) In neither case is there a greatest or maximal element. 


e The negative integers) ordered by the standard definition of < have a maximal element 
which is also the greatest element, —1. They have no minimal or least element. 


e The natural numbers N ordered by the standard < have a least element, 1, which is 
also a minimal element. They have no greatest or maximal element. 


e The rationals| greater than zero with the standard ordering < have no least element or 
minimal element, and no maximal or greatest element. 
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34.4 partitions less than cofinality 


If à < cf(«) then s > (K)}. 


This follows easily from the definition of For any [coloring] f: k — A then define 
g: > &+1 by g(a) = |f (a)|. Then x = 9°, g(a), and by the normal] rules of cardinal] 


arithmatic sup,e, g(a) = s. Since A < cf(K), there must be some a < A such that g(a) = «x. 
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34.5 well ordered set 


A well-ordered set is a totally ordered set in which every nonempty has a least 


member. 


An example of well-ordered set is the set of positive {integers| with the standard lorder'relation| 
(Z*,<), because any nonempty subset of it has least member. However, R* (the positive 
is not a well-ordered set with the usual order, because (0,1) = {x:0 < x < Il}isa 
nonempty subset but it doesn’t contain! a least number. 


A well-ordering of a set X is the result of defining a binary relation < on X to itself in 
such a way that X becomes well-ordered with respect to <. 


Version: 9 Owner: drini Author(s): drini, vypertd 


34.6 pigeonhole principle 
For any natural number! n, there does not exist a between n and a proper subset 
ofn. 


The mame} of the theorem is based upon the observation that pigeons will not occupy a 
pigeonhole that already [containsla pigeon, so there is no way to fit n pigeons in fewer than 
n pigeonholes. 
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34.7 proof of pigeonhole principle 


It will first be proven that, if a exists between two (finite sets, then the two sets 


have the same number of elements. 
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Let S and T be finite sets and f: S — T be a bijection. Since f is injective) then |S] = 
|ran f|. Since f is surjective) then |T| = | ran f|. Thus, |S] = |T]. 


Since the pigeonhole principle} is the of the proven statement, it follows that 
the pigeonhole principle holds. 


Version: 2 Owner: Wkbj79 Author(s): Wkbj79 


34.8 tree (set theoretic) 
In a set theory, a tree is defined to be a set T and alrelation) <pC T x T such that: 


e <7 is a partial ordering of T 
e For any t ET, {s E T | s <r t} is well-ordered] 


The |nodes| immediately greater than a node are termed its children, the node immediately 
less is its parent (if it exists), amy node less is an ancestor and any node greater is a 
descendant. A node with no ancestors is a root. 


The partial ordering distance from the root, and the requirement 
prohibits any [loops] or splits below a node (that is, each node has at most one parent, and 
therefore at most one grand-parent, and so on). Since there is generally no requirement that 
the tree be the null ordering makes any set into a tree, although the tree is a 
trivial one, since each element of the set forms a single node with no children. 


Since the set of ancestors of any node is well-ordered, we can associate it with an ordinal) 
We call this the height, and write: ht(t) = 0.t.({s € T | s <r t}). This all accords with 
normal usage: a root has height 0, something immediately above the root has height 1, and 
so on. We can then assign a height to the tree itself, which we define to be the least number 
greater than the height of any element of the tree. For [finite] trees this is just one greater 
than the height of its tallest element, but infinite] trees may not have a tallest element, so 
we define ht(T) = sup{ht(t) +1 | t € T}. 


For every a <r ht(T) we define the a-th level to be the set Ta = {t E€ T | ht(t) =a}. So 
of course To is all roots of the tree. If a <r ht(T) then T(q) is the subtree of elements with 
height less than a: t € T(a) = z E T A ht(t) <a. 


We call a tree a «-tree for any |cardinal] « if |T| = « and ht T = kx. If x is finite, the only way 
to do this is to have a single [branch] of length «. 
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34.9 «-complete 


A structured set S (typically alfilterlor a[Boolean algebra) is -complete if, given any K C S 
with |K| < k, QK €S. It islcompletelļif it is -complete for all x. 


Similarly, a is k-complete if any Sequence) of fewer than « elements has an 
upper bound) within the partial order. 


A &,-complete structure is called countably complete. 
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34.10 Cantor’s diagonal argument 


One of the starting points in Cantor’s development of [set theory| was his discovery that there 
are different degrees of infinity) The rational numbers, for example, are|countably infinite; it 
is possible to enumerate’ all the rational numbers by means of an infinite list. By contrast, 
the are it is impossible to enumerate them by means of an 
infinite list. These discoveries underlie the idea of which is expressed by saying 
that two sets have the same cardinality if there exists a [bijective] correspondence between 
them. 


In essence, Cantor discovered two theorems: first, that the set of real numbers has the same 
cardinality as the power set] of the {naturals} and second, that a set and its power set have a 
different cardinality (see Cantor’s theorem). The proof of the second result is based on the 
celebrated diagonalization argument. 


Cantor showed that for every given infinite [sequence] of real numbers z1, £2, £3,... it is 
possible to construct a real number x that is not on that list. Consequently, it is impossible 
to enumerate the real numbers; they are uncountable. No generality is lost if we suppose 
that all the numbers on the list are between 0 and 1. Certainly, if this [subset] of the real 
numbers in uncountable, then the full set is uncountable as well. 


Let us write our sequence as a table of decimal expansions: 


0. di1 di2 di3 di4 
0. dz da2 dy3 də4 
0. d31 d32 d33 d34 
0 da dag da3 dag 


where 
In = O.dnidn2dn3dna re) 


and the expansion avoids an infinite trailing |string] of the digit 9. 
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For each n = 1,2,... we choose a digit c, that is different from dpn and not equal to 9, and 
consider the real number x with decimal expansion 


0.c1 C2C3 ae 


By construction, this number is different from every member of the given sequence. After 
all, for every n, the number z differs from the number zp in the n™ decimal digit. The claim 
is proven. 
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34.11 Fodor’s lemma 


If « is a [regular] fancountablelcardinall $ is a Stationary] Subseil of x, and f : K > « is 
regressive on S (that is, f(a) < a for any a € S) then there is some y and some stationary 
So C S such that f(a) = y for any a € So. 
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34.12 Schroeder-Bernstein theorem 


Let S and T be sets. If there exists an injection] f : S — T and an injection g : T — S, then 
S and T have the same [cardinality] 


The Schroder-Bernstein theorem is useful for proving many results about cardinality, since 
it replaces one hard problem (finding a{bijection| between S and T) with two generally easier 
problems (finding two injections). 
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34.13 Veblen function 


The Veblen function is used to obtain larger than those provided by 
exponentiation. It builds on a hierarchy of and [unbounded classes! 


e Cr(0) is the additively indecomposable numbers, H 


e Cr(Sn) = Cr(n)' the set of fixed points) of the enumerating function] of Cr(n) 
© Cr(A) = Nac, Crla) 
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The Veblen function Yab is defined by setting Ya equal to the enumerating function of Cr(a). 


We call a number a strongly critical if a € Cr(a). The of strongly critical ordinals 
is written SC, and the enumerating function is written fsc(a) = To. 


Io, the first strongly critical ordinal, is also called the Feferman-Schutte ordinal. 
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34.14 additively indecomposable, 


An a is called additively indecomposable if it is not 0 and for any 6,7 < a, 
6 +y <a. The set of additively indecomposable ordinals is denoted H. 


1 € H, since 0+0 < 1. Also w € H since the sum of two [finite] numbers is still 


finite, and no finite numbers other than 1 are in H. 


HI is |closed|and unbounded} so the{enumerating function|of H is\normal| In fact, fala) = w®. 


The |derivativel fála) is written ca. The number €o = wer therefore, is the first fixed point 
of the |series]w, w”, w”, .... 
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34.15 cardinal number 


A cardinal number is an S with the |property that S C X for every ordinal 
number X which has the same as S. 
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34.16 cardinal successor 


The cardinal successor of alcardinall« is the least cardinal greater than «. It is denoted k*t. 
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34.17 cardinality 


Cardinality is a notion of the size of a set which does not rely on numbers. It is a relative 
notion because, for instance, two sets may each have an/|infinite number of elements, but one 
may have a greater cardinality. That is, it may have a ”more infinite” number of elements. 


The formal definition of cardinality rests upon the notion of alone-to-one|mapping between 
sets. 


Definition. 
Sets A and B have the same cardinality if there is a one-to-one and f from A 
to B (a\bijection|) Symbolically, we write |A| = |B|. This is also called equipotence. 


Results. 


1. A is equipotent to A. 
2. If A is equipotent to B, then B is equipotent to A. 


3. If A is equipotent to B and B is equipotent to C, then A is equipotent to C. 
Proof. 


1. The function on A is a bijection from A to A. 

2. If f is a bijection from A to B, then f~t exists and is a bijection from B to A. 

3. If f isa bijection from A to B and g is a bijection from B to C, then fog is a bijection 
from A to C. 


Example. 


The set of even integers] E has the same cardinality as the set of Z. We define 
f :E— Z such that f(x) = 5. Then f is a bijection, therefore |E| = |Z]. 
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34.18 cardinality of a countable union 


Let C be a countablellcollection| of countable sets. Then UC is countable. 
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34.19 cardinality of the rationals 


The set of Q is countable) and therefore its cardinality) is No. 
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34.20 classes of ordinals and enumerating functions 


A (class) of {ordinals} is just a/subset! of the ordinals. For every class of ordinals M there is an 


enumerating function fm defined by transfinite recursion: 
fula) =min{z € M | f(8) < x for all 8 < a} 


This [functionlsimply lists the elements of M inorder] Note that it is not necessarily defined 
for all ordinals, although it is defined for a/segment| of the ordinals. Let otype( M) = dom(f) 
be the order type of M, which is either On or some ordinal a. If a < 8 then f(a) < 


fi(), so fm is an order [isomorphism] between otype(M) and M. 
We say M is «-closed if for any N C M such that |N| < «, also sup N € M. 


We say M is s-unbounded if for any a < «k there is some 3 € M such that a < 8. 


We say a function f : M — On is «-continuous if M is K-closed and 


f(sup N) = sup{ f(a) | a € N} 


A function is k-normal if it is order preserving (a < 8 implies f(a) < f(8)) and continuous. 
In particular, the enumerating function of a «-closed class is always «-normal. 


All these definitions can be easily extended to all ordinals: a class is closed (resp. un- 
bounded) if it is K-closed (unbounded) for all x. A function is continuous (resp. normal) 
if it is K-continuous (normal) for all «x. 
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34.21 club 


If « is a [cardinal then a set C C « is closed [iff for any S C C and a < «K, sup(S Na) =a 
then a € C. (That is, if the limit] of some {sequence} in C is less than « then the limit is also 
in C.) 


If x is a cardinal and C C « then C is unbounded if, for any a < «, there is some 8 € C 
such that a < p. 
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If a set is both closed and unbounded then it is a club set. 
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34.22 club filter 


IfkKisa then club(«), the {filter| of all sets containing a [club] 
lsubset] of «x, is a K-complete filter closed] under [diagonal intersection] called the club filter. 


To see that this is a filter, note that x € club(k) since it is both closed and 
If x € club(«) then any subset of « containing zx is also in club(«), since x, and 
therefore anything containing it, [contains a [club set) 


It isa k [complete] filter because the lintersectionl of fewer than « club sets is a club set. To 
see this, suppose (C;)i<a is a Sequence] of club sets where a < «K. Obviously C = f C; is 
closed, since any sequence which appears in C appears in every C;, and therefore its [limit] is 
also in every C;. To show that it is unbounded, take some 8 < «K. Let (611) be an increasing 
sequence with 61,1 > 6 and ĝi: € C; for every i < a. Such a sequence can be constructed, 
since every C; is unbounded. Since a < « and « is regular, the limit of this sequence is less 
than «. We call it 62, and define a new sequence ((2,;) similar to the previous sequence. 
We can repeat this process, getting a sequence of sequences ((3;;) where each element of a 
sequence is greater than every member of the previous sequences. Then for each i < a, (Bja) 
is an increasing sequence contained in C;, and all these sequences have the same limit (the 
limit of (3;;)). This limit is then contained in every C;, and therefore C, and is greater than 


To see that club(«) is closed under diagonal intersection, let (C;), i < K be a sequence, and 
let C = Aje,.C;. Since the diagonal intersection contains the intersection, obviously C is 
unbounded. Then suppose S C C and sup(Sf)a) = a. Then S C Cg for every 8 > a, and 
since each Cg is closed, a € Cg, soa € C. 
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34.23 countable 


A set S is countable if there exists a{bijection| between S and some |subset! of N. 
All/finitel sets are countable. 
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34.24 countably infinite 


A set © is countably infinite if there is a between S and N. 
As the mama implies, any countably infinite set is both [countable] and linfinite| 
Countably infinite sets are also sometimes called denumerable. 
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34.25 finite 


A set S is finite if there exists a {natural number] n and a [bijection] from S to n. If there 
exists such an n, then it is unique, and it is called the of S. 
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34.26 fixed points of normal functions 

If f : M — On is a function then Fix(f) = {x € M | f(x) =z} is the set of fixed points] of 
f. f', the derivative of f, is the enumerating function) of Fix(f). 


If f is k-normal then Fix(f) is «-closed and «-normal, and therefore f’ is also «-normal. 
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34.27 height of an algebraic number 


Suppose we have an algebraic number such that the polynomial) of smallest it is a 
lroot| of (with the co-efficients relatively prime) is given by: 


n 
J a,x’ 
i=0 


Then the height h of the algebraic number is given by: 


h=n+)> ai 
i=0 
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This is a quantity which is used in the proof of the existence of transcendental numbers 
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34.28 if A is infinite and B is a finite subset of A, then 
A \ B is infinite 


Theorem. If A is an infinite set) and B is a|finitellsubset) of A, then A \ B is infinite. 


Proof. The proof is by contradiction. If A \ B would be finite, there would exist a k € N 
and a f : {1,...,k} — A\ B. Since B is finite, there also exists a bijection 
g: {1,...,1} > B. We can then define a|mapping|h: {1,...,k +1} —> A by 


hr a f(t) wheni € {1,...,k}, 
My) ee when € {k +1,...,k +1}. 


Since f and g are bijections, h is a bijection between a finite subset of N and A. This is a 
contradiction since A is infinite. 
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34.29 limit cardinal 


A limit cardinal is alcardinal|«k such that At < « for every cardinal A < K. Here At denotes 
the[cardinal successorļof A. If 2* < « for every cardinal À < «, then «x is called a strong limit 


cardinal. 


Every strong limit cardinal is a limit cardinal, because A+ < 2^ holds for every cardinal A. 
Under [GCH] every limit cardinal is a strong limit cardinal because in this case At = 2^ for 


every cardinal À. 


The three smallest limit cardinals are 0, No and Nu. Note that some authors do not count 0, 
or sometimes even No, as a limit cardinal. 
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34.30 natural number 


Given the Zermelo-Fraenkel axioms|of set theory, one can prove that there exists an 
X such that Ø € X. The natural numbers N are then defined to be the [intersection] of all 
subsets) of X which are inductive sets and [contain] the [empty set|as an element. 


The first few natural numbers are: 


e 0:=0 
e 1:=0' = {0} = {0} 
e 2:=1' = {0,1} = {0, {O}} 
e 3:= 2 = {0,1,2} = {0, {0}, 10, {0}}} 
Note that the set 0 has|zero elements| the set 1 has one element, the set 2 has two elements, 


etc. Informally, the set n is the set consisting of the n elements 0,1,...,2—1, and n is both 
a subset of N and an element of N. 


In some contexts (most notably, in number theory), it is more convenient to exclude 0 from 
the set of natural numbers, so that N = {1,2,3,...}. When it is not explicitly specified, one 
must determine from context whether 0 is being considered a natural number or not. 


Addition of natural numbers is defined inductively as follows: 


ea+0:=aforalacN 


e a+b := (a+b) for all a,b € N 
Multiplication of natural numbers is defined inductively as follows: 


e a-0:=0 forala eN 


e a- b' := (a-b) +a for all a,b € N 


The natural numbers form a [monoid] under either addition or multiplication. There is an 
ordering relation on the natural numbers, defined by: a < b if a C b. 
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34.31 ordinal arithmetic 


Ordinal arithmetic is the extension) of to the transfinite 
The Successor] operation Sz (sometimes written x+ 1, although this notation risks confusion 
with the general definition of addition) is |part|of the definition of the ordinals, and addition 
is naturally defined by recursion over this: 

ezxz+0=0 

exzt+Sy=S(x+y) 

è +a = SUPyca T HY for limit a 
If x and y arelfinitelthen x+y under this definition is just the usual sum, however when x and 
y become infinite, there are [differences] In particular, ordinal addition is not 


For example, 
w +1 =w + S0 = Sw +0) = Sw 


but 
1l+w=sup,.,l+tn=w 


Multiplication in turn is defined by iterated addition: 


exz-0=0 
exz-Sy=z-yt+s 
è T-a = SUP <a T: Y for limit a 
Once again this definition is to normal multiplication when x and y are finite, but 


is not commutative: 
w:2 =w: l+w=w+w 


but 
2- wW = SUPpcy 2 N =W 


Both these are strongly increasing in the second argument and weakly increasing 
in the first argument. That is, if œ < 8 then 


eyta< 7+ 
ey a<y:p 
eat+y<B+y 
ea- ypg 
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34.32 ordinal number 


An ordinal number is awell ordered seti S such that, for every x € S, 
g={zES|z<z} 


(where < is the [ordering relation] on S). 
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34.33 power set 


Definition If X is a set, then the power set of X is the set whose elements are the [subsets] 
of X. It is usually denoted as P(X) or 2*. 


1. If X is alfinitelset, then |2*| = 2*1. This motivates the notation 2*. 


2. For an arbitrary set X, two things about the power set: First, 


there is no between X and P(X). Second, the {cardinality of 2% is greater 
than the cardinality of X. 
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34.34 proof of Fodor’s lemma 


If we let fT! : k — P(S) be thelinverse| of f restricted to S then |Fodor’s lemma is [equivalent] 
to the claim that for any [function] such that a € f(k) — a > «K there is some a € S such 


that f(a) is 


Then if Fodor’s lemma is false, for every a € S there is some Ca such that 


CaN f(a) = 0. Let C = Age,Cy. The club sets are [closed] under diagonal intersection} 
so C is also [club] and therefore there is some a € SAC. Then a € Cg for each 3 < a, and 
so there can be no 3 < a such that a € f~'(8), so f(a) > a, a contradiction. 
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34.35 proof of Schroeder-Bernstein theorem 


We first prove as allemmalthat for any B C A, if there is an [injection] f : A > B, then there 
is also a/bijection|h : A > B. 
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Define a Sequence] {C;,}%, of subsets} of A by Co = A — B and for k > 0, Chi = f (Cp). 


If the C are not pairwise [disjoint] then there are minimal {integers| j and k with j < k and 
C; [Ck nonempty. Then k > 1, and so Ck C B. CoN B =, so j > 0. Thus Cj = f (Cj) 
and Ck = f(C,-1). By assumption, f is injective, so Cj_1 (| Ck-ı is nonempty, contradicting 
the minimality of j. Hence the Cp are pairwise disjoint. 


Now let C = Uo Cx, and define h : A > B by 


A(z) = l ~ o l 


If z € C, then h(z) = f(z) € B. But if z ¢ C, then z € B, and so h(z) € B. Hence h is 
well-defined; h is injective by construction. Let b € B. If b C, then h(b) = b. Otherwise, 
b € Ck = f (Ck-1) for some k > 0, and so there is some a € Cy_, such that h(a) = f(a) =b. 


Thus b is bijective; in particular, if B = A, then h is simply the identity map on A. 


To prove the theorem, suppose f : S — T and g : T — S are injective. Then the composition 
gf : S — q(T) is also injective. By the lemma, there is a bijection h’ : S — g(T). The 
injectivity of g implies that g~! : g(T) — T exists and is bijective. Define h : S — T by 
h(z) = g~'h'(z); this (mapjis a bijection, and so S and T have the same [cardinality] 
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34.36 proof of fixed points of normal functions 
Suppose f is a K-normal function! and consider any a < « and define a sequence] by ao = a 
and Qn41 = f (Qn). Let a, = sup,<,,@n- Then, since f is {continuous 
f(a.) = sup f (an) = sup Andi = aw 
n<w n<w 


So Fix( f) is unbounded) 


Suppose N is a set of|fixed points|of f with |N] < x. Then 


f(sup N) = sup f(a) = sup a = sup N 
aeN acN 


so sup Ñ is also a fixed point of f, and therefore Fix(f) is 
Version: 1 Owner: Henry Author(s): Henry 
34.37 proof of the existence of transcendental numbers 


Cantor discovered this proof. 
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Lemma: 


Consider a natural numberl k. Then the number of algebraic numbers] of height] k is [nitel 


Proof: 


To see this, note the sum in the definition of height is positive. Therefore: 
n<k 


where n is the of the polynomial, For a polynomial of degree n, there are only n 


coefficients, and the sum of their moduli is (k — n), and there is only a finite number of ways 
of doing this (the number of ways is the number of algebraic numbers). For every polynomial 
with degree less than n, there are less ways. So the sum of all of these is also finite, and 
this is the number of algebraic numbers with height k (with some repetitions). The result 
follows. 


Proof of the main theorem: 


You can start writing a list of the algebraic numbers because you can put all the ones with 
height 1, then with height 2, etc, and write them in numerical order] within those sets because 
they are finite sets. This implies that the set of algebraic numbers is countable! However, 
by diagonalisation, the set of real numbers) is uncountable, So there are more real numbers 


than algebraic numbers; the result follows. 
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34.38 proof of theorems in aditively indecomposable 


H is closed 


Let {a; | i < K} be some increasing sequence) of elements of H and let a = sup{a; | i < K}. 
Then for any x,y < a, it must be that x < a; and y < a; for some i,j < K. But then 
T +Y < Amax{i,j} < Q. 
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H is unbounded 


Consider any a, and define a sequence by ap = Sa and Qj41 = An, + Ay. Let Qw = SUP, 2, Qw 
be the limit] of this sequence. If x,y < a, then it must be that z < a; and y < a; for some 
i,j < w, and therefore x + y < Omax{ij}41- Note that a, is, in fact, the next element of H 
since every element in the sequence is [clearly] additively decomposible. 


, 


fala) = w 


Since 0 is not in H, we have fy(0) = 1. 


For any a+ 1, we have fala + 1) is the least additively indecomposible number greater than 
fula). Let ao = Sfyla) and Qn41 = An + Qan = an: 2. Then f(a + 1) = sup,c,, An = 
SUPpc, Sa+2-+--2 = fy(a)-w. The limit case is trivial since H is closed unbounded, so fy 
is continuous} 
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34.39 proof that the rationals are countable 


Suppose we have alrational numberla = p/q in|lowest_terms|with q > 0. Define the “height” 
of this number as h(a) = |p| +q. For example, h(0) = h(2) = 1, h(-1) = A(1) = 2, 
and h(—2) = h(=+) = h($) = A(2) = 3. Note that the set of numbers with a given height] 
is The can now be partitioned into [classes] by height, and the numbers in 
each class can be ordered by way of increasing Thus it is possible to assign 
a natural number! to each of the rationals by starting with 0,—1, 1, —2, 4, 5,2, —3,... and 
progressing through classes of increasing heights. This assignment constitutes a [bijection] 
between N and Q and proves that Q is {countable 


A corollary is that the lirrational numbers) are {uncountable} since the funion| of the irrationals 


and the rationals is R, which is uncountable. 
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34.40 stationary set 


If k is alcardinal, C C «x, and C intersects every \clublin « then C is stationary. If C is not 
stationary then it is thin. 
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34.41 successor cardinal 


A successor cardinal is a cardinal that is the cardinal successor of some cardinal. 
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34.42 uncountable 


Definition A set is uncountable if it is not [countable] In other words, a set S is uncount- 
able, if there is no|subset] of N with the same [cardinality] as S. 


1. All uncountable sets are [infinite] However, the converse is not true. For instance, the 
and the rational numbers!- although infinite - are both countable. 


2. The [real numbers) form an uncountable set. The famous proof of this result is based 
on Cantor’s diagonal argument 


Version: 2 Owner: matte Author(s): matte, vampyr 


34.43 von Neumann integer 


A von Neumann integer is not an|integer| but instead a construction of alnatural number] 
using some basic set notation. The von Neumann integers are defined inductively. The 
von Neumann integer zero is defined to be the (, and there are no smaller von 
Neumann integers. The von Neumann integer N is then the set of all von Neumann integers 


less than NV. The set of von Neumann integers is the set of all 


This form of construction from very basic notions of sets is applicable to various forms of 


(for instance, Zermelo-Fraenkel set theory). While this construction suffices to 


define the set of natural numbers, a little more work must be done to define the set of all 
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Examples 


o 

{0} = {9} 

{0,1} = {0, {O}} 

{0,1,2} = {0,10}, 110, {0} 


wnr oO 
ll Il 


N = {0,1,...,.N-1} 
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34.44 von Neumann ordinal 


The von Neumann ordinal is a method of defining ordinals in 


The von Neumann ordinal a is defined to be the containing the von Neu- 
mann ordinals which precede a. The set of [finite] von Neumann ordinals is known as the 


von Neumann integers, Every well-ordered set is isomorphic, a von Neumann ordinal. 


They can be constructed by transfinite recursion as follows: 


e The empty set is 0. 


e Given any ordinal a, the ordinal a + 1 (the successor of «a is defined to be aL{a}. 
e Given a set A of ordinals, U4 a is an ordinal. 
If an ordinal is the successor of another ordinal, it is an successor ordinal. If an ordinal is 


neither 0 nor a successor ordinal then it is a limit ordinal. The first limit ordinal is named 
wW. 


The [class] of ordinals is denoted On. 
The von Neumann ordinals have the convenient that if a < b then a € banda C b. 


Version: 5 Owner: Henry Author(s): Henry, Logan 


230 


34.45 weakly compact cardinal 


Weakly compact cardinals are (large) which have a|property related to the 


syntactic compactness theorem for first order logic, Specifically, for any infinite cardinal «, 
consider the |language] L,.,«. 


This language is identical to first ‘logic except that: 


(a) infinite conjunctions) and |disjunctions| of fewer than « formulas) are allowed 


(b) infinite |strings| of fewer than « quantifiers) are allowed 


The weak compactness theorem for Ly « States] that if A is a set of [sentences] of L,,, such 
that |A| =« and any 6 C A with |8| < « is|\consistent|then A is consistent. 


A cardinal is weakly compact if the weak compactness theorem holds for L,, ,. 
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34.46 weakly compact cardinals and the tree property 


A [cardinal] is if and only if it is inaccessible! and has the 
Weak implies tree property 


Let « be alweakly compact cardinal] and let (T, <r) be a « |treel with all Jevels| smaller than 
k. We define a in Lp with for each x € T, alconstant!c,, and a single tmaryjrelation| 
B. Then our theory A consists of the sentences} 


e —=[B(c,) A B(c,)| for every incompatible x,y € T 
e V,cr(a) BÈC) for each a < «K 
It should be clear that B |represents| membership in a (cofinal branch, since the first class of 


sentences asserts that no incompatible elements are both in B while the second class |states| 
that the [branch] intersects every level. 


slearly |A| = «, since there are « elements in T, and hence fewer than «K -K = & sentences 
in the first group, and of course there are « levels and therefore « sentences in the second 
group. 
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Now consider any © C A with || < «. Fewer than « sentences of the second group are 
included, so the set of x for which the corresponding c, must all appear in T(a) for some 
a < Kk. But since T has branches of arbitrary height} T(a) = X. 


Since « is weakly compact, it follows that A also has a\model| and that model obviously has 
a set of c, such that B(c,) whose corresponding elements of T intersect every level and are 
compatible, therefore forming a cofinal branch of T, proving that T is not Aronszajn. 
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34.47 Cantor’s theorem 


Let X be any set and P(X) its {power set] Cantor’s theorem |states| that there is no [bijection] 
between X and P(X). Moreover the cardinality] of P(A) is stricly greater than that of A, 
that is |A| < |P(A)|. 
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34.48 proof of Cantor’s theorem 


The proof of this theorem is fairly [simple] using the following construction which is central 
to Cantor’s diagonal argument 


Consider a [function] F: X — P(X) from X to its powerset. Then we define the set Z C X 
as follows: 
Z= {xE X |xgF(x)}. 


Suppose that F is, in fact, a/bijection| Then there must exist an « € X such that F(x) = Z. 
But, by construction, we have the following contradiction: 


reEeZerc¢F(r) Ssr¢gZ. 


Hence F cannot be a bijection between X and P(X). 
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34.49 additive 


Let @ be some defined on an [algebra] of sets A. We say that ġ is additive if, 
whenever A and B are disjoint sets in A, we have 


(AUB) = (A) + 9(B). 
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Suppose A is a o+algebra, Then, given any sequence} (A;) of disjoint sets in A, if we have 
6 (Ui) = 594d) 
we say that ¢ is countably additive or c-additive. 


Useful of an additive set function @ include the following: 


1. (0) =0. 
2. If AC B, then ¢(A) < (B). 

3. If A C B, then ¢(B \ A) = 4(B) — 4(A). 

4. Given A and B, (AUB) + (ANB) = (A) + (B). 
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34.50 antisymmetric 


A {relation R on A is antisymmetric iff Vz,y € A, (Ry A yRx) — (x = y). The number 


n2—n 
of possible antisymmetric [relations| on A is 2"3~2~ out of the 2”° total possible relations, 
where n = |A|. 


Antisymmetric is not the same thing as ” not as it is possible to have both at 
the same time. However, a relation R that is both antisymmetric and symmetric has the 
condition that Ry = x = y. There are only 2” such possible relations on A. 


An example of an antisymmetric|relation on|A = {0, x, x} would be R = {(x, x), (x, 0), (0, *), (x, x)}. 
One relation that isn’t antisymmetric is R = {(x, ©), (x, o), (0, *)} because we have both xRo 
and oRx, but o # x 
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34.51 constant function 


Definition Suppose X and Y are sets and f : X — Y is alfunction) Then f is a constant 
function if f(a) = f(b) for all a,b in X. 
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Properties 


1. The composition of a constant function with any function (for which composition is 
defined) is a constant function. 


2. A constant map between topological spaces] is [continuous] 
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34.52 direct image 


Let f : A — B be alfunction| and let U C A be a'subset| The direct image of U is the set 
f(U) C B consisting of all elements of B which equal f(u) for some u € U. 
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34.53 domain 


Let R be a [binary relation, Then the set of all x such that xRy is called the domain of R. 
That is, the domain of R is the set of all first [coordinates] of the [ordered pairs] in R. 
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34.54 dynkin system 


Let Q be a set, and P(Q) be the {power set|of Q. A dynkin system on Q is a set D C P(Q) 
such that 


1 QED 
2,.A,BeEDandACBS>B\AED 


3. An €D, An © Anyi, 2 >13U%, Ag ED. 


Let A C P(Q) be a set, and consider 


T = {X : X is a dynkin system and A € X}. (34.54.1) 
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We define the {intersection| of all the dynkin systems containing A as 
D(A) := (|) X (34.54.2) 
Xer 
One can easily verify that D(A) is itself a dynkin system and that it [contains] A. We call 


D(A) the dynkin system generated by A. It is the “smallest” dynkin system containing 
A 


A dynkin system which is also 7-system)is a 
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34.55 equivalence class 


Let S be a set with an [equivalence relation] ~. An equivalence class of S under ~ is a subset 
T C S such that 


e If x € T and y €E 85S, then z ~ y if and only if y E€ T 


e If S is nonempty, then T is nonempty 


For x € S, the equivalence class containing x is often denoted by [x], so that 

[z] = {y E€ 8 |£ ~ yf. 
The set of all equivalence classes of S under ~ is defined to be the set of all subsets of S 
which are equivalence classes of S' under ~. 


For any equivalence relation ~, the set of all equivalence classes of S under ~ is a 
of S, and this correspondence is a [bijection] between the set of equivalence relations on S 
and the set of partitions of S (consisting of nonempty sets). 
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34.56 fibre 


Given a [function] f: X — Y, a fibre is an [inverse image] of an element of Y. That is given 
y €Y, f-'({y}) = {x€ X | f(x) = y} is a fibre. 
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Example 


Define f : R? — R by f(x,y) = 27+ y?. Then the fibres of f consist of [concentric circles] 
about the origin, the origin itself, and [empty sets depending on whether we look at the 


inverse image of a positive number, zero, or a negative number respectively. 
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34.57 filtration 


A filtration is a sequence) of sets A1, A2, . . . , An with 
A, CAC- C Ap. 
If one considers the sets A;,...,A, as elements of a larger set which are partially ordered 


by [inclusion] then a filtration is simply a finitel/chain| with respect to this partial [ordering] 


It should be noted that in some contexts the word ” filtration” may also be employed to 


describe an chain. 
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34.58 finite character 


A family F of sets is of finite character if 


1. For each A € F, every of A belongs to F; 


2. If every finite subset of a given set A belongs to F, then A belongs to F. 
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34.59 fix (transformation actions) 


Let A be a set, and T : A — A altransformation| of that set. We say that x € A is fixed by 
T, or that T fixes x, whenever 
T(x) = x. 


The |subset! of fixed elements is called the fixed set of T, and is frequently denoted as A’. 
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We say that a subset B C A is fixed by T whenever all elements of B are fixed by T, i.e. 
BCA’, 
If this is so, T restricts to the [identity] transformation on B. 
The definition generalizes readily to a family of transformations with common [domain] 
T;: A> A, icl 


In this case we say that a subset B C A is fixed, if it is fixed by all the elements of the 
family, i.e. whenever 
B c )A™. 


icl 
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34.60 function 


Let A and B be sets. A function f : A — B is alrelation] R from A to B such that 


e For every a € A, there exists b € B such that (a,b) € R. 
e Ifa € A, bı,b2 E€ B, and (a, bı) € R and (a, b2) E R, then bı = bə. 


For a € A, one usually denotes by f(a) the unique element b € B such that (a,b) € R. The 
set A is called the domain of f, and the set B is called the codomain. 
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34.61 functional 


Definition A functional T is a function|/mapping)a function space (often a{vector space) 
V in to afield] of scalars) K, typically taken to be R or C. 


Discussion Examples of functionals include the and A functional f is 
often indicated by the use of square brackets, T'|xz] rather than T(z). 


The linear functionals are those functionals T that satisfy 


e T(x+y)=T(x)+T(y) 
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e T(cx)=cT(x) 


for any cE K, xz,yeV. 
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34.62 generalized cartesian product 


Given any family of sets {A;} <7 indexed by anlindexlset J, the generalized cartesian product 


14; 


jEJ 
is the set of all [functions] 
f:J—> © Aj 
jEJ 


such that f(j) € A; for all j € J. 


For each i € J, the projection map 


mi: | [4 > Ai 


jEJ 


is the function defined by 
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34.63 graph 


The graph of a [function] f : X — Y is the|subset|of X x Y given by {(x, f(x)): a € X}. 
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34.64 identity map 


Definition If X is a set, then the identity map in X is the that [maps] each 
element in X to itself. 
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Properties 


1. An identity map is always a [bijection] 


2. Suppose X has two topologies: 7; and 72. Then the identity mapping J : (X, n) > 
(X, 72) is{continuous] if and only if 7, islfinerl than 72, i.e., Ti C Tə. 


3. The identity map on the n-sphere, is homotopic| to the A: S” — S” if 
n is odd [I]. 


REFERENCES 


1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974. 
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34.65 inclusion mapping 


Definition Let X be alsubsetlof Y. Then the inclusion map from X to Y is the 
iX — Y 


TtT => T. 


In other words, the inclusion map is simply a fancy way to say that every element in X is 
also an element in Y. 


To indicate that a mapping is an inclusion mapping, one usually writes — instead of — 
when defining or mentioning an inclusion map. This hooked farrow) symbol — can be seen 
as combination of the symbols C and —. In the above definition, we have not used this 
convention. However, examples of this convention would be: 


e Let .: X — Y be the inclusion map from X to Y. 


e We have the inclusion S” — R”*!. 
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34.66 inductive set 


An inductive set is a set X with the that, for every x € X, the Successor) x’ of x is 
also an element of X. 


239 


One major example of an inductive set is the set of {natural numbers] N 
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34.67 invariant 
Let A be a set, and T : A — A altransformation| of that set. We say that x € A is an 
invariant of T whenever z is fixed| by T 
T(t) Se. 
We say that alsubset) B C A is invariant with respect to T whenever 
T(B) C B. 
If this is so, the restriction of T is a well-defined transformation of the invariant subset: 


T| :B— B. 
B 


The definition generalizes readily to a family of transformations with common [domain] 
T,:A-A, icl 


In this case we say that a subset is invariant, if it is invariant with respect to all elements of 
the family. 
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34.68 inverse function theorem 


Let £ be a [continously differentiable vectorzvalned functionlmapping] the open se] E C R" 


"and let S = f(E). If, for some a a € F, the Jacobian) |J—(a)|, is non-zero, then 
ae is a uniquely defined [function| g and two open sets X C E and Y C S such that 


1. ac X, f(a) EY; 
2. Y =f(X); 
3. f : X — Y is one-one; 


4. g is continuously differentiable on Y and g(f(x)) = x for all x € X. 
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Simplest case 


When n = 1, this theorem becomes: Let f be a continuously differentiable, real-valued 


function defined on the I. If for some point a € I, f'(a) Æ 0, then there 
is a la, 8] of a in which f is strictly Then y > f(y) isa 
continuously differentiable, [strictly monotonic function] from [f(a), f(8)] to [a, 8]. If f is 


increasing (or decreasing) on fa, 8], then so is f~! on [f(a), f(8)]. 


Note 


The inverse function theorem is a special case of the implicit function theorem| where the 
dimension of each variable is the same. 
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34.69 inverse image 


Let f : A — B be a [function], and let U C B be a [subset] The inverse image of U is the 
set f~'(U) C A consisting of all elements a € A such that f(a) € U. 


The inverse image commutes with all set operations: For any [collection] {U;}ie7 of subsets of 
B, we have the following [identities] for 


1. {unions} 


a (U n) =|JF 1G) 


icl wel 
2, 
m (N n) AG 
icl icl 


and for any subsets U and V of B, we have identities for 


3. (complements 


4. set. differences 


5. symmetric differences 


In addition, for X C A and Y C B, the inverse image |satisfies| the miscellaneous identities 


6. (fx) Y) =X FAY) 
7 FAT YD =V NFA) 
8. X C f-l(f(X)), with equality if f is [injective] 
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34.70 mapping 


Synonym of function, although typical usage suggests that mapping is the more generic 


In a geometric context, the term function is often employed to connote a mapping whose 
purpose is to assign values to the elements of its ie. a function defines a [field] of 
values, whereas mapping seems to have a more geometric connotation, as in a mapping of 
one space to another. 
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34.71 mapping of period n is a bijection 


Theorem Suppose X is a set. Then a f:X —> X lof period|n is a [bijection] 


Proof. If n = 1, the claim is trivial; f is the identity mapping, Suppose n = 2,3,.... Then 
for any x € X, we have x = f(f”7t(x)), so f is an|surjection| To see that f is a [injection] 
suppose f(x) = f(y) for some x,y in X. Since f” is the identity, it follows that z = y. O 
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34.72 partial function 


A [function] f : A — B is sometimes called a total function, to signify that f(a) is defined 
for every a € A. If C is any set such that C D A then f is also a partial function from C 
to A. 


(Clearly) if f is a function from A to B then it is a partial function from A to B, but a partial 
function need not be defined for every element of its [domain] 
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34.73 partial mapping 


Let X),--- , Xn and Y be sets, and let f be alfunctionlof n variables: f : X |x X9x---x Xn > 
Y. Fix a; € X;for2 <i <n. Theinduced|mapping|a +> f(a, 22,..., n) is called the partial 


mapping determined by f corresponding to the first variable. 


In the case where n = 2, the map defined by a +> f(a,x) is often denoted f(-x). Further, 
any function f : X; x Xə — Y determines a mapping from X, into the set of mappings 
of Xə into Y, namely f : x œ (y+ f(zx,y)). The converse holds too, and it is customary 
to identify f with f. Many of the “canonical isomorphisms” that we come across (e.g. in 


are illustrations of this kind of identification. 
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34.74 period of mapping 


Definition Suppose X is a set and f isa f:X — X. If f” is thelidentity mapping 
on X for some n = 1,2,..., then f is said to be a mapping of period n. Here, the notation 
f” means the n-fold composition f o---o f. 


Examples 
1. A mapping f is of period 1 if and only if f is the identity mapping. 


2. Suppose V is a [vector space) Then a L:V —V is a mapping of 
period 2. For example, the reflection mapping x +> —z is a mapping of period 2. 


3. In the the mapping z + e-27"/"z is a mapping of period n for n = 
i en 


4. Let us consider the [function| space |spanned by) the trigonometric functions sin and cos. 
On this space, the derivative] is a mapping of period 4. 


Properties 


1. Suppose X is a set. Then a mapping f : X — X of period n is a (proof. ) 
2. Suppose X is a topological space, Then alcontinuous|mapping f : X — X of period n 
is a omeomorphian] 
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34.75 pi-system 


Let Q be a set, and P(Q) be the power set) of Q. A t-system (or pi-system) on Q is a set 
F C P(Q) such that 


A,BeF=>Al)\Bes. (34.75.1) 


A m-system is under 
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34.76 proof of inverse function theorem 


Since det Df(a) # 0 the Df (a) is invertible: let A = (Df(a))~' be its 
Choose r > 0 and p > 0 such that 


B=B,(a) CE, 


1 
IDE- DEON 57a YEB 


po. 


2||All 
Let y € B,(f(a)) and consider the 
Ty: B — R” 


T,(a) =x + A- (y — f(a). 


If x € B we have 
|| DT, (z)|| = |1 — A- Df(x)|| < |All - LD F(@) — DF(x)Il < = 


Let us verify that T, is a contraction mapping, Given x1, £2 € B, by the[mean-value theorem] 


on R” we have 


1 
|T (£1) — Ty(a2)| < sup PEA el [e1 = 22] < Sle — zəl: 
LE|L1 LQ 


Also notice that T,(B) C B. In fact, given x € B, 


ITy(2) — al < |Zy(x) — Ty(a)| + (Tla) — al < $e — al + |A- (y = f(a) < 4+ Alr <p 
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So T}: B — B is a contraction mapping and hence by the contraction principle there exists 


one and only one solution to the equation 
To) =T, 
i.e. x is the only point in B such that f(x) = y. 


Hence given any y € B,(f(a)) we can find x € B which solves f(x) = y. Let us call 
g: B,(f(a)) — B the mapping which gives this solution, i.e. 


f(g) =y. 


Let V = B,(f(a)) and U = g(V). [Clearly] f: U — V is one to one and the inverse of f 
is g. We have to prove that U is a neighbourhood] of a. However since f is [continuous] in 
a we know that there exists a ball) Bs(a) such that f(Bs(a)) C B,(yo) and hence we have 
Bs(a) CU: 


We now want to study the differentiability of g. Let y € V be any point, take w € R” and 
€ > 0 so small that y + ew € V. Let x = g(y) and define v(e) = gly + ew) — g(y). 


First of all notice that being 


we have 1 
zele) > |v(e) — eA - w| > |v(e)| — el All - lw] 


and hence 
lv(e)| < 2el| A|| - Jw]. 


On the other hand we know that f is differentiable in « that is we know that for all v it 
holds 
f(z +v) -— f(x) =Df(x)-v+ hw) 


with lim,_,9 h(v)/|v| = 0. So we get 


PODI o AMAN IO)! yen e ep 
€ E v(e) l 

So 

m 2 +8) gW) +e) = oly) = reed) = lim Df (x) 


e—0 € e—0 € 
that is 


-ew — hlole) 


= Df()*-w 


Dg(y) =D] Gy 
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34.77 proper subset 


Let S be a set and let X C S be alsubset| We say X is a proper subset of S if X £ S. 
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34.78 range 


Let R be a Then the set of all y such that «Ry for some x is called the 
range of R. That is, the range of R is the set of all second [coordinates] in the jordered pairs| 
of R. 


In terms of this means that the range of a function is the full set of values it can 
take on (the outputs), given the full set of parameters (the inputs). Note that the range is 
a of the 


Version: 2 Owner: akrowne Author(s): akrowne 


34.79 reflexive 
A kelationi R on A is reflexive if and only if Va € A, aRa. The number of possible reflexive 
lrelations| on A is gran out of the 2”” total possible relations, where n = | Al. 


For example, let A = {1, 2,3}, Ris alrelation onl A. Then R = {(1, 1), (2, 2), (3, 3), (1,3), (3, 2)} 
would be a reflexive relation, because it all the (a,a), a € A pairs. However, 
R = {(1, 1), (2, 2), (2, 3), (8, 1)} is not reflexive because it would also have to contain (3, 3). 
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34.80 relation 


A relation is any [subset] of alcartesian product of two sets A and B. That is, any RC A x B 
is a binary relation. One may write aRb to denote a € A, b € B and (a,b) € R. A subset of 
A x A is simply called a relation on A. 


An example of a relation is the less-than relation on [integers] ie. < C Z x Z. (1,2) € <, 
but (2,1) g <. 
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34.81 restriction of a mapping 
Definition Let f : X — Y be a/mapping|from a set X to a set Y. If A is a [subset] of X, 
then the restriction of f to A is the mapping 
fla: aA > Y 
a +> f(a). 
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34.82 set difference 


Let A and B sets in some ambient set X. The set difference, or simply difference, between 
A and B (in that order) is the set of all elements that are contained in A, but not in B. 
This set is denoted by A \ B, and we have 


A\B = {tEX|reEA,x ¢€ Bh 
= AL lB" 
where B® is the [complement] of Bin X. 


Remark 


Sometimes the set difference is also written as A — B. However, if A and B are sets in a 


then A — B is commonly used to denote the set 
A-B={a-b|aeA,beE B}, 


which, in general, is not the same as the set difference of A and B. Therefore, to avoid 
confusion, one should try to avoid the notation A — B for the set difference. 
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34.83 symmetric 


A R on A is symmetric iff Vz,y € A xRy — yRx. The number of possible 
n2 n 
symmetric |relations| on A is 2 2 out of the 2” total possible relations, where n = | A]. 
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An example of asymmetriclrelation onl A = {a, b,c} would be R = {(a, a), (c, b), (b,c), (a,c), (c, a)}- 
One relation that is not symmetric is R = {(b, b), (a,b), (b, a), (c,b)}, because since we have 
(c, b) we must also have (b,c) in|order| to be symmetric. 
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34.84 symmetric difference 


The symmetric difference between two sets A and B, written AA B, is the set of all 
x such that either x € A or x € B but not both. It is equal to (A — B) U(B — A) and 


(AUB) -= (ANB). 


The symmetric difference is since AA B = (A — B) U(B — A) = 
(B-ALU(A- B)= BA A. 


The operation is also associative) To see this, consider three sets A,B, and C. Any given 
elemnet x is in zero, one, two, or all three of these sets. If x is not in any of A, B, or C, 
then it is not in the symmetric difference of the three sets no matter how it is computed. 
If x is in one of the sets, let that set be A; then x € AA B and x € (AA B)A C; also, 
x ¢ (BA C) and therefore x € AA (BA C). If x is in two of the sets, let them be A 
and B; then x ¢ AA B and z ¢ (AA B)A C; also, x € BA C, but because x is in A, 
xz é AA (BA C). If x is in all three, then x ¢ AA B but x € (AA B)A C; similarly, 
xz é BAC but z € AA (BA C). Thus, AA (BA C) = (AA BJA C. 


In general, an element will be in the symmetric difference of several sets it is in an 


[lodd number! of the sets. 
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34.85 the inverse image commutes with set operations 


Theorem. Let f be almapping|from X to Y. If {B;}je7 is a (possibly funcountable) [collection] 
of subsets! in Y, then the following [relations hold for the inverse image} 


FUB =Ur (Bi) 


@) FAB) =A) 


If A and B are subsets in Y, then we also have: 
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(3) For the set [complement] 


(4) For the 


Proof. For part (1), we have 


(UB) = {ee xl fæ) UB} 


ie. icl 


{x € X | f(x) € B; for some i € I} 
{re X | f(x) € Bi} 


ie. 


Ur 


i€l 
Similarly, for part (2), we have 


(Bi) {z € X | f(x) €f )Bi} 


tel wel 


= {xe xX | f(x) € B; for alli € I} 
= (\ireX| fe) EB) 


icI 
= N pi 
icI 
For the set complement, apnoe a ¢ f(A). This is [equivalent] to f(x) ¢ A, or f(x) € AL, 


which is equivalent to x € f~!(A®). Since me set difference A \ B can be written as AN BÀ, 
part (4) follows from parts (2) and (3). Similarly, since A A B = (A\ B)U(B \ A), part (5) 
follows from parts (1) and (4). O 
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34.86 transformation 


Synonym of [mapping] and function, Often used to refer to mappings where the domain) and 
lcodomain| are the same set, i.e. one can compose a transformation with itself. For example, 


when one speaks of transformation of a space, one refers to some|deformation/ of that space. 
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34.87 transitive 


Let A be a set. A is said to be transitive if whenever x € A then x C A. 
Equivalently, A is transitive if whenever x € A and y € x then y € A. 
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34.88 transitive 


A relation| R on A is transitive if and only if Vz,y,z € A, (cRy A yRz) — (xz). 


For example, the “is a subset of” relation C between sets is transitive. The “is not equal 
to” relation # between |integers|is not transitive. If we assign to our definiton x = 5, y = 42, 
and z = 5, we know that both 5 4 42 (x # y) and 42 £5 (y # z). However, 5 = 5 (x = 2), 
so # is not transitive 
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34.89 transitive closure 


The transitive closure of a set X is the smallest {transitivel set tc(X) such that X C tc(X). 
The transitive closure of a set can be constructed as follows: 


Define a [function] f on w by f(0) = X and f(n+1) =U f(n) 


te(X) = [U Fín) 


n<w 
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34.90 Hausdorff’s maximum principle 


Theorem Let X be a partially ordered set! Then there exists a{maximall totally ordered 
subset) of X. 


The Hausdorff’s maximum principle is one of the many theorems|equivalent|to thelaxiom of choice} 
The below proof uses which is also equivalent to the axiom of choice. 
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Proof. Let S be the set of all totally ordered subsets of X. S is not empty, since the 
is an element of S. Given a subset 7 of S, the of all the elements of T 
is again an element of S, as is easily verified. This shows that S, ordered by [inclusion] is 


inductive. The result now follows from Zorn’s lemma. 
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34.91 Kuratowski’s lemma 


Any chain) in an ordered set is contained in a|\maximal) chain. 
This proposition] is equivalent to the axiom of choice 
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34.92 Tukey’s lemma 


Each nonempty family of has a|maximal element) 


Here, by a maximal element we mean a maximal element with respect to the 
A < BIA C B. This lemmalis equivalent] to the axiom of choicel 
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34.93 Zermelo’s postulate 


If F is a|disjoint| family of nonempty sets, then there is a set C which has exactly one element 
of each A € F (i.e such that Af C is a|singleton|for each A € F.) 


This is one of the many which are [equivalent] to the laxiom of choice) 
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34.94 Zermelo’s well-ordering theorem 


If X is any set whatsoever, then there exists a|well-ordering of X. The well-ordering theorem 
is jequivalent|to the 
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34.95 Zorn’s lemma 


Let X be a [partially ordered set\ and suppose that every |chain| in X has an upper bound 
Then X has almaximal element) x, in the sense that for all y € X, y # z. 


Zorn’s lemma is equivalent, to the axiom of choice 
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34.96 axiom of choice 


Let C be a of nonempty sets. Then there exists a f with 
C such that f(x) € x for all x € C. f is sometimes called a choice function on C. 
The axiom of choice is commonly (although not universally) accepted with the [axioms] of 


Zermelo-Fraenkel set theory, The axiom of choice is to the|well-ordering principle 
and to 


The axiom of choice is sometimes called the multiplicative axiom, as it is equivalent to 


the that a product of |cardinals] is zero if and only if one of the factors is zero. 
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34.97 equivalence of Hausdorff’s maximum principle, 
Zorn’s lemma and the well-ordering theorem 


Hausdorff’s maximum principle implies Zorn’s lemma. Consider a\partially ordered set] 
X, where every \chain|has anjupper bound, According to the maximum principle|there exists 

a |maximall|totally ordered|\subset| Y C X. This then has an upper bound, x. If x is not 
the largest element in Y then {x} Y would be a totally ordered set in which Y would be 
properly contained, contradicting the definition. Thus x is almaximal element]in X. 


Zorn’s lemma implies the well-ordering theorem. Let X be any non-empty set, and 


let A be the (collection) of pairs (A, <), where A C X and < is a/well-ordering on A. Define 
a [relation] <, on A so that for all z,y € A: x < y [iffl x equals an initial of y. It is easy 


to see that this defines a partial order|relation on|A (it inherits reflexibility, anti [symmetry] 
and |transitivity from one set being an initial and thus a subset of the other). 


For each chain C C A, define C” = (R, <’) where R is the [union] of all the sets A for all 
(A, <) € C, and <’ is the union of all the {relations| < for all (A, <) € C. It follows that C” 
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is an upper bound for C in A. 


According to|Zorn’s lemma) A now has a maximal element, (M, <m). We postulate) that M 
[contains] all members of X, for if this were not true we could for any a € X — M construct 
(M,, <.) where M, = M U{a} and S, is extended so S,(M,) = M. [Clearly] <, then defines 
a well-order on M,, and (M, Sx) would be larger than (M, <m) contrary to the definition. 


Since M contains all the members of X and <m is a well-ordering of M, it is also a well- 
ordering on X as required. 


The well-ordering theorem implies Hausdorff’s maximum principle. Let (X, <) 
be a partially ordered set, and let < be a well-ordering on X. We define the function] o by 
transfinite recursion over (X, <) so that 


$(a) = t if{a}UU,<a (0) is totally ordered under = . 


0 otherwise. 
It follows that (J -y (x) is a maximal totally ordered subset of X as required. 
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34.98 equivalence of Zorn’s lemma and the axiom of 
choice 


Let X be a set partially ordered by < such that each [chain] has an Equate 
each x € X with p(x) = {y Ee X | x < y} C P(X). Let p(X) = {p(x) | x€ X}. If p(x) = 0 
then it follows that x is{maximall 


Suppose no p(x) = Ø. Then by the axiom of choicel there is a choice [function] f on p(X), and 
since for each p(x) we have f(p(x)) € p(x), it follows that p(x) < f(p(x)). Define fa(p(x)) 
for all [ordinals] i by transfinite induction] 


fo+i(p(e)) = f(p(@)) 
And for a a, let fa(p(x)) be the upper bound of f;(p(x)) for i < a. 


This construction can go on forever, for any ordinal. Then we can easily construct a surjective 


function from X to Ord by g(a) = f(x). But that requires that X be a {proper class, in 
contradiction to the fact that it is a set. So there can be no such choice function, and there 


must be a{maximal element) of X. 
For the reverse, assume Zorn’s lemmaland let C be any set of non-empty sets. Consider the 
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set of functions F = {f | Va € dom(f)(a € C A f(a) € a)} partially ordered by [inclusion] 
Then the of any chain in F is also a member of F (since the union of a chain of 
functions is always a function). By Zorn’s lemma, F has a maximal element f, and since 
any function with |domain| smaller than C can be easily expanded, dom(f) = C, and so f is 
a choice function C. 
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34.99 maximality principle 


Let S be alcollection of sets. If, for each|chain|C C S, there exists an X € S such that every 
element of C is a|subset|of X, then S [contains] a [maximal element, This is known as a the 


maximality principle. 
The maximality principle is [equivalent] to the 
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34.100 principle of finite induction 


Let S be a set of positive integers) with the 


1. 1 belongs to S, and 
2. whenever the integer k is in S, then the next integer k + 1 must also be in S. 
Then S' is the set of all positive integers. 


The Second Principle of Finite Induction would replace (2) above with 
2’. If kis a positive integer such that 1,2,...,k belong to S, then k + 1 must also be in S. 


The Principle of Finite Induction is a consequence of the 
Version: 3 Owner: KimJ Author(s): KimJ 
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34.101 principle of finite induction proven from well- 
ordering principle 


Let T be the set of all postive/integers|not in S. Assume T is nonempty. The|well-ordering principle| 
says T [contains] a least element} call it a. Since 1 € S, we have a > 1, hence 0 <a—1 <<a. 

The choice of a as the smallest element of T means a — 1 is not in T, and hence is in S. But 

then (a — 1) + 1 is in 9, which {forces| a € S, contradicting a € T. Hence T is empty, and S 

is all positive integers. 
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34.102 proof of Tukey’s lemma 


Let S bea set and F a set of|subsetslof S such that F is of finite character, By/Zorn’s lemma, 
it is enough to show that F is inductive. For that, it will be enough to show that if (Fi)ier 
is a family of elements of F which is ‘totally ordered) by then the U of the 
F, is an element of F as well (since U is an |upper bound on the family (F;)). So, let K be 
a finite subset of U. Each element of U is in F; for some i € I. Since K is finite and the F, 
are totally ordered by inclusion, there is some j € J such that all elements of K are in F}. 
That is, K C F}. Since F is of finite character, we get K € F, QED 
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34.103 proof of Zermelo’s well-ordering theorem 
Let X be any set and let f be a choice on P(X) \ {0}. Then define a function i by 
transfinite recursion on the [class] of as follows: 


i(B) = f(x - U {i(y)}) unless X — © {i(y)} = or i(y) is undefined for some y < 8 
y< y< 
(the function is undefined if either of the unless clauses holds). 


Thus i(0) is just f(X) (the least_element|of X), and i(1) = f(X — {i(0)}) (the least element 
of X other than i(0)). 


Define by thelaxiomlof replacement 3 = i~'[X] = {y | i(x) = y for some x € X}. Since £ is 
a set of ordinals, it cannot [contain] all the ordinals (by the [Burali-Forti paradox). 


Since the ordinals are well ordered, there is a least ordinal @ not in (3, and therefore i(@) is 
undefined. It cannot be that the second unless clause holds (since a is the least such ordinal) 
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so it must be that X — U <a1i(7)} = 0, and therefore for every x € X there is some y < a 


such that i(y) = x. Since we already know that i is|injective) it is a between a and 
X, and therefore establishes a |well-ordering|of X by x <x y = i™t(x) <i" (y). 


The reverse is [simple] If C is a set of nonempty sets, select any well [ordering] of UC. Then 


a choice function is just f(a) = the least member of a under that well ordering. 
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34.104 axiom of extensionality 


If X and Y have the same elements, then X = Y. 


The Axiom of Extensionality is one of thelaxioms| of Zermelo-Fraenkel set theory, In symbols, 


it reads: 
Vulue XoucY)oX=Y. 


Note that the converse, 
X=Y +VWulue X oueY) 


is an axiom of the predicate calculus. Hence we have, 


X=YoVulue X oueY). 


Therefore the Axiom of Extensionality expresses the most fundamental notion of a set: a set 
is determined by its elements. 
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34.105 axiom of infinity 


There exists an linfinite set! 
The Axiom of Infinity is anlaxiom|of/Zermelo-Fraenkel set theory, At first glance, this axiom 


seems to be ill-defined. How are we to know what constitutes an infinite set when we have 
not yet defined the notion of alfinite|set? However, once we have a|theory|of ordinal numbers] 


in hand, the axiom makes sense. 


Meanwhile, we can give a definition of finiteness that does not rely upon the concept of 
number. We do this by introducing the notion of an A set S is said to be 
inductive if Ø € S and for every x € S, x| {xz} € S. We may then the Axiom of 
Infinity as follows: 


There exists an inductive set. 
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In symbols: 


IS[ € SA (Va € S)[x|_J{z} € SI] 
We shall then be able to prove that the following conditions are equivalent; 


1. There exists an inductive set. 
2. There exists an infinite set. 


3. The least nonzero limit ordinal) w, is a set. 
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34.106 axiom of pairing 


For any a and b there exists a set {a,b} that |contains| exactly a and b. 


The Axiom of Pairing is one of the [axioms] [axioms] of [Zermelo-Fraenkel set theory, In symbols, it 


reads: 


VaVbacva(z E€ c= x =aVu=Dd). 
Using the we see that the set c is unique, so it makes sense to define 
the pair 

{a,b} = the unique c such that Va(z € c e z =aVz =D). 
Using the Axiom of Pairing, we may define, for any set a, the [singleton] 
{a} = {a, a}. 
We may also define, for any set a and b, the ordered pair] 
(a,b) = {{a}, ta, Df}. 

Note that this definition satisfies) the condition 

(a,b) = (c, d) iff a = c and b = d. 


We may define the ordered n-tuple recursively 


(a1, cei elle) = ((@1, ..- , an—1), an). 
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34.107 axiom of power set 


For any X, there exists a set Y = P(X). 


The Axiom of Power Set is an axiom] of [Zermelo-Fraenkel set theory| In symbols, it reads: 
VXAYVu(u EY > uC X). 


In the above, u C X is defined as Vz(z € u — z € X). Hence Y is the set of all subsets] of 
X. Y is called the power set of X and is denoted P(X). By the set Y is 


unique. 


The Power Set Axiom allows us to define the [cartesian product) of two sets X and Y: 
X xY ={(x, y): LE XAyYEY}. 


The Cartesian product is a set since 


BRC P EX IY )): 


We may define the Cartesian product of any of sets recursively: 
XX ++) X Xn = (X1 X X Xi) X Xn- 


Version: 5 Owner: Sabean Author(s): Sabean 


34.108 axiom of union 


For any X there exists a set Y = |J X. 


The Axiom of Union is an laxioml of [Zermelo-Fraenkel set theory, In symbols, it reads 


VXAYVu(u € Y > az(z E€ X Aue 2)). 


Notice that this means that Y is the set of elements of all elements of X. More succinctly, 
the union of any set of sets is a set. By the set Y is unique. Y is called the 


union of X. 


In particular, the Axiom of Union, along with the allows us to define 
xY =U% ¥}, 


as well as the triple 


{a,b,c} = {a,b} J{c} 
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and therefore the n-tuple 


{a1,..-,4n} = arl el fan} 
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34.109 axiom schema of separation 


Let ġ(u, p) be alformulal For any X and p, there exists a set Y = {u € X : d(u,p)}. 


The Axiom Schema of Separation is an axiom] schema of Zermelo-Fraenkel set theory| Note 
that it represents] infinitely many individual axioms, one for each formula ¢. In symbols, it 


reads: 


VXVpsYVu(u € Y = u € XA G(u,p)). 
By lextensionality| the set Y is unique. 


The Axiom Schema of Separation implies that ¢ may depend on more than one parameter 
p. 


We may show by linductionl that if (u, pı, ..., pn) is a formula, then 
VXVp: <- Vpn JY Vu(u € Y > u E€ X A G(u,pi,---;Pn)) 
holds, using the Axiom Schema of Separation and the [axiom of pairing] 


Another consequence of the Axiom Schema of Separation is that a subclass of any set is a 
set. To see this, let C be thelclass|C = {u : é(u, pi,---,Pn)}. Then 


vx (C()X =Y) 


holds, which means that the lintersection| of C with any set is a set. Therefore, in particular, 
the intersection of two sets XQ Y = {x € X:x € Y} isa set. Furthermore the [difference] 
of two sets X — Y = {x € X : x ¢ Y} is a set and, provided there exists at least one set, 


which is guaranteed by the [axiom of infinity] the [empty set) is a set. For if X is a set, then 
Ø= {xr E€ X :x# rx} isa set. 


Moreover, if C is a nonempty class, then ()C is a set, by Separation. [C is a [subset] of 
every X E C. 


Lastly, we may use Separation to show that the class of all sets, V, is not a set, i.e., V is a 


For example, suppose V is a set. Then by Separation 
V'={xrEV:x¢x} 


is a set and we have reached a Russell paradox 
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34.110 de Morgan’s laws 


In set theory, de Morgan’s laws relate the three basic set operations to each other; the 
the intersection, and the [complement] de Morgan’s laws are named after the Indian- 
born British mathematician and logician Augustus De Morgan (1806-1871) [I]. 


If A and B are|subsets| of a set X, de Morgan’s laws |statel that 
ALJ = Ae, 
(ANB) = ALB.. 


Here, |J denotes the union, (N) denotes the intersection, and A? denotes the set complement 
of A in X, i.e., Al = X \ A. 


Above, de Morgan’s laws are written for two sets. In this form, they are intuitively quite 
clear. For instance, the first claim states that an element that is not in AL) B is not in A 
and not in B. It also states that an elements not in A and not in B is not in AUB. 


For an arbitrary collection] of subsets, de Morgan’s laws are as follows: 


Theorem. Let X be a set with subsets A; C X for i € I, where J is an arbitrary index-set. 
In other words, I can be /finite) or uncountable, Then 


a es c 
(Ua) = NA 


JA. 


wel tel 


“——~ 
D 
= 
SE 
I 


(proof) 


de Morgan’s laws in a Boolean algebra 


For [Boolean] variables x and y in a|Boolean algebra\ de Morgan’s laws state that 
(ayy = avy’, 
(avy = a’ Ay’. 


Not surprisingly, de Morgan’s laws form an indispensable tool when simplifying digital cir- 
cuits involving and, or, and not gates [B]. 


REFERENCES 


1. Wikipedia’s entry on de Morgan) 4/2003. 
2. M.M. Mano, Computer Engineering: Hardware Design, Prentice Hall, 1988. 
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34.111 de Morgan’s laws for sets (proof) 


Let X be a set with subsets) A; C X for i € I, where J is an arbitrary index-set. In other 
words, I can be {finite} or uncountable| We first show that 


(LJ Ai)’ = ()Ai, 


iI icI 
where A’ denotes the complement) of A. 


Let us define S = (Uye, Ai)’ and T = (ler A;. To establish the equality S = T, we shall 
use a standard argument for proving equalities in [set theory] Namely, we show that S C T 
and T C S. For the first claim, suppose x is an element in S. Then x ¢ Ucr Ai, so x € A; 
for any i € I. Hence x € Aj for all i € J, and x € (),., A; = T. Conversely, suppose x 
is an element in T = (),-, Aj. Then x € Aj for alli € J. Hence x ¢ A; for any i € I, so 
x Ucr Ai and z € S. 


icl 


The second claim, 


follows by applying the first claim to the sets A‘. 
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34.112 set theory 


Set theory is special among mathematical in two ways: It plays a central role in 
putting mathematics on a reliable axiomatic foundation, and it provides the basic 
and apparatus in which most of mathematics is expressed. 


34.112.1 Axiomatic set theory 


I will informally list the undefined notions, the and two of the “schemes” of set 
theory, along the lines of account. The axioms are closer to the von Neumann- 
Bernays-Godel model than to the model. (But some of the axioms are 
identical to some in ZFC; see the entry ZermeloFraenkelAxioms.) The intention here is just 
to give an idea of the level and scope of these fundamental things. 


There are three undefined notions: 


1. the relation of equality of two sets 
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2. the relation of membership of one set in another (x € y) 


3. the notion of an ordered 3. pair, which is a set comprised from two 3. other sets, in a 
specific 


Most of the eight belong more properly to than to set theory, but they, or 
something on the same level, are needed in the work of formalizing any theory that uses the 
notion of equality, or uses quantifiers] such as J. Because of their formal nature, let me just 
(informally) |statel two of the schemes: 


S6. If A and B are sets, and A = B, then anything true of A is true of B, and conversely. 


S7. If two F(x) and G(x) of a set x are equivalent, then the “generic” set having 
the property F', is the same as the generic|set having the property G. 


(The notion of a generic set having a given property, is formalized with the help of the 
Hilbert 7 symbol; this is one way, but not the only way, to incorporate what is called the 


axiom of choice!) 


Finally come the five axioms in this axiomatization of set theory. (Some are identical to 
axioms in ZFC, q.v.) 


A1. Two sets A and B are equal iff they have the same elements, i.e. iff the relation z € A 
implies x € B and vice versa. 


A2. For any two sets A and B, there is a set C such that the x € C is equivalent to x = A 
or x = B. 


A3. Two ordered pairs (A, B) and (C, D) are equal iff A = C and B = D. 


A4. For any set A, there exists a set B such that x € B is equivalent to x C A; in other 
words, there is a set of all [subsets] of A, for any given set A. 


A5. There exists an [infinite set] 


The word “infinite” is defined in terms of Axioms A1-A4. But to formulate the definition, 
one must first build up some definitions and results about functions) and ordered sets, which 
we haven’t done here. 


34.112.2 Product sets, relations, functions, etc. 


Moving away from foundations and toward applications, all the more 
and relations| of set theory are built up out of the three undefined notions. (See the entry 


“Set”.) For instance, the relation A C B between two sets, means simply “if s € A then 
zeb”. 
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Using the notion of ordered pair, we soon get the very important |structure|called the product 
A x B of two sets A and B. Next, we can get such things as equivalence relations and order 


relations on a set A, for they are subsets of A x A. And we get the critical notion of a 
function A — B, as a subset of A x B. Using functions, we get such things as the product 
[ier Ai of a family of sets. (“Family” is a variation of the notion of function.) 


To be strictly formal, we should distinguish between a function and the of that func- 
tion, and between a relation and its graph, but the distinction is rarely necessary in practice. 


34.112.3 Some structures defined in terms of sets 


The {natural numbers) provide the first example. Peano, Zermelo and Fraenkel, and others 
have given axiom-lists for the set N, with its addition, multiplication, and order relation; 
but nowadays the custom is to define even the natural numbers in terms of sets. In more 
detail, a natural number is the order-type of a |finite| well-ordered set! The relation m < n 
between m,n € N is defined with the aid of a certain theorem which says, roughly, that for 
any two well-ordered sets, one is a|segment|of the other. The sum or product of two natural 
numbers is defined as the \cardinall of the sum or product, respectively, of two sets. (For an 
extension of this idea, see surreal numbers. ) 


(The term “cardinal” takes some work to define. The “type” of an ordered set, or any other 
kind of structure, is the “generic” structure of that kind, which is defined using 7.) 


(Groups| provide another simple example of a structure defined in terms of sets and ordered 


pairs. A group is a pair (G, f) in which G is just a set, and f is a GxG—-G 
satisfying certain axioms; the axioms (associativity etc.) can all be spelled out in terms of 
sets and ordered pairs, although in practice one uses notation to do it. When we 


speak of (e.g.) “the” group S; of of a 3-element set, we mean the “type” of 
such a group. 


Topological spaces) provide another example of how mathematical structures) can be defined 
in terms of, ultimately, the sets and ordered pairs in set theory. A topological space is a pair 
(S,U), where the set S is arbitrary, but U has these properties: 


—any element of U is a subset of S 
- the [union] of any family — (or set) of elements of U is also an element of U 
— the lintersection| of any — finite family of elements of U is an element of U. 


Many special kinds of topological spaces are defined by enlarging this list of restrictions on 
U. 


Finally, many kinds of structure are based on more than one set. E.g. alleft module is a 
commutative group] M together with a|ring| R, plus a mapping R x M — M which [satisfies] 
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a specific set of restrictions. 


34.112.4 Categories, homological algebra 


Although set theory provides some of the language and apparatus used in mathematics 
generally, that language and apparatus have expanded over time, and now include what are 
called “categories” and “functors”. A |category]is not a set, and alfunctor|is not a mapping, 
despite similarities in both cases. A category comprises all the structured sets of the same 
kind, e.g. the groups, and contains also a definition of the notion of a morphism from one 
such structured set to another of the same kind. A functor is similar to a morphism but 
compares one category to another, not one structured set to another. The classic examples 
are certain functors from the category of topological spaces to the category of groups. 


“Homological algebra” is concerned with of morphisms within a category, plus 
functors from one category to another. One of its aims is to get structure theories for specific 


categories; the of groups and the of are examples. For 
more details on the categories and functors of homological I recommend a search 


for “Eilenberg-Steenrod axioms”. 
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34.113 union 


The union of two sets A and B is the set which all z € A and all z € B. The 
union of A and B is written as (AU B). 


For any sets A and B, 


ze A|LJB& (ce A)v (xe B) 
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34.114 universe 


A universe U is a nonempty set satisfying the following |axioms! 


1. If x € U andy € z, then y E€ U. 


2. If x,y € U, then {x,y} € U. 
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3. If x € U, then the P(x) € U. 


4. If {x;|i € I € U} is a family of elements of U, then U,., x; € U. 
From these axioms, one can deduce the following 


1. If x € U, then {x} € U. 

2. If x is alsubsetlof y € U, then x € U. 

3. If x,y € U, then the ordered pair) (x, y) = {{z, y}, £} is in U. 

4. If x,y € U, then «Uy and z x y are in U. 

5. If {a,|i € J € U} is a family of elements of U, then the product [[,<; 2; is in U. 

6. If x € U, then the of x is strictly less than the cardinality of U. In 
particular, U ¢ U. 


The standard reference for universes is [GAA]. 


REFERENCES 


[SGA4] Grothendieck et al. SGA4. 
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34.115 von Neumann-Bernays-Gdel set theory 


von Neumann-Bernays-Gédel (commonly referred to as NBG or vNBG) is an axiomatisation 
of closely related to the more familiar Zermelo-Fraenkel with choice (ZFC) ax- 
ioamatisation. The primary|difference| between ZFC and NBG is that NBG has|proper classes| 
among its objects. NBG and ZFC are very closely related and are in fact equiconsistent, 
NBG being a conservative extension) of ZFC. 


In NBG, the proper classes are differentiated from sets by the fact that they do not belong 
to other |classes, Thus in NBG we have 


Set(r) > dyr € y 


Another interesting fact about proper classes within NBG is the following limitation of size principle 


of von Neumann: 
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Set (x) > |x| = |V] 


where V is the set theoretic This principle can in fact replace in NBG essen- 
tially all set existence axioms] with the exception of the powerset axiom (and [obviously] the 
axiom of infinity). Thus the classes that are proper in NBG are in a very clear sense big, 
while the sets are small. 


The NBG set theory can be axiomatised in two different ways 


e Using the Gödel class construction [functions], resulting in a axiomatisation 
e Using a class resulting in an [infinite] axiomatisation 
In the latter alternative we take ZFC and relativise all of its axioms to sets, i.e. we replace 


every expression of form Vx@ with Vx(Set(x) — ¢) and Jro with Jz(Set(x) A ¢) and add 
the class comprehension scheme 


If ¢ is a Formula] with a free variable! x with all its quantifiers|restricted) to 


sets, then the following is an axiom: JAVa(x% € A e @) 


Notice the important restriction to formulae with quantifiers restricted to sets in the scheme. 
This requirement makes the NBG proper classes predicative; you can’t prove the existence 
of a class the definition of which quantifies over all classes. This restriction is essential; if we 
loosen it we get a|theory| that is not conservative over ZFC. If we allow arbitrary formulae in 
the class comprehension axiom scheme we get what is called Morse-Kelley set theory. This 
theory is essentially stronger than ZFC or NBG. In addition to these axioms, NBG also 
[contains] the global axiom of choice! 


ACWx32(C (z = {z}) 


Another way to axiomatise NBG is to use the eight Gödel class construction functions. These 
functions correspond to the various ways in which one can build up formulae (restricted 
to sets!) with set parameters. However, the functions are finite in number and so are the 
resulting axioms governing their behaviour. In particular, since there is a class corresponding 
to any restricted formula, the [intersection] of any set and this class exists too (and is set). 
Thus the comprehension scheme of ZFC can be replaced with a finite number of axioms, 
provided we allow for proper classes. 


It is easy to show that everything provable in ZF is also provable in NBG. It is also not too 
difficult to show that NBG - global choice is conservative extension of ZFC. However, showing 
that NBG (including global choice) is a conservative extension of ZFC is considerably more 
difficult. This is equivalent) to showing that NBG with global choice is conservative over 
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NBG with only local choice (choice restricted to sets). In lorder| to do this one needs to use 
(class) This result is usually credited to Easton and Solovay. 
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34.116 FS iterated forcing preserves chain condition 


Let «x be a and let (Qg)g<a be a finite support iterated forcing) where for 


every p <a, lFp, Qa has the & chain condition. 
By linduction} 


Po is the empty set 


If P, \satisfies| the « [chain condition! then so does Py41, since Py+1 is to Pa * Qa 
and composition preserves the « chain condition for regular] «x. 


Suppose & is a and Pg satisfies the «x chain condition for all G6 < a. Let 
S = (p;)icx be alsubset! of P, of [size] x. The [domains] of the elements of p; form « 
subsets of a, so if cf(a) > « then these are [bounded] and by the inductive hypothesis, two 
of them are compatible. 


Otherwise, if cf(a) < «x, let (a;);<ct(a) be an increasing Sequence] of in a. 
Then for any 7 < « there is some n(i) < cf(a@) such that dom(p;) C ani. Since « is regular 
and this is alpartition| of « into fewer than « pieces, one piece must have size «, that is, there 
is some j such that j = n(i) for x values of i, and so {p; | n(i) = j} is a set of conditions 
of size «x contained in P,,, and therefore compatible members by the induction 
hypothesis. 


Finally, if cf(a) = «, let C = (aj); <, be a/strictly increasing) [continuous] sequence cofinal in 
a. Then for every i < «K there is some n(i) < «x such that dom(p;) C ana). When n(i) is a 
limit ordinal, since C is continuous, there is also (since dom(p;) is finite) some f(z) < i such 
that dom(p;) Masi), ai) = Ø. Consider the set Æ of elements i such that 7 is a limit ordinal 
and for any j < i, n(j) < i. This is alclub]| so by [Fodor’s lemma] there is some j such that 


{i | fl) = j} is 


For each p; such that f(i) = j, consider p; = p; | j. There are « of these, all members of P;, 
so two of them must be compatible, and hence those two are also compatible in P. 
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34.117 chain condition 


A {partial order] P satisfies|the s-chain condition if for any S C P with |S| = « then there 
exist distinct x,y E€ S such that either x < y or y <S x. 


If k = N; then P is said to satisfy the countable chain condition (c.c.c.) 
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34.118 composition of forcing notions 


Suppose P is a forcing} notion in MN and Ô is some P-name such that Ikp Ô is a forcing 


notion. 


Then take a set of P-names Q such that given a P mame Q of Q, lFp Q = Q (that is, no 
matter which generic) subset) G of P we lforce| with, the names in Q correspond precisely to 
the elements of Q[G]). We can define 


PxQ={(p,¢ |PEP, GE Q} 


We can define a partial order on P x Q such that (pi, ĝi) < (pa, G2) if pı <p p and pı I- 


hh <6 ĝ2- (A note on |interpretation; qı and q2 are P names; this requires only that ĝi < qe 
in generic subsets pı, so in other generic subsets that fact could fail.) 


Then P x Q is itself a forcing notion, and it can be shown that forcing by P * Q is [equivalent] 
to forcing first by P and then by Q[G]. 
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34.119 composition preserves chain condition 


Let k bea regular cardinal] Let P bea notion satisfying the « 


Let Ô be a P-name such that Ikp Q is a forcing notion satisfying the « chain 


condition. Then P * Q|satisfies| the « [chain] conditon. 
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Proof: 


Outline 


We prove that there is some p such that any of P including p also includes « 
of the p;. Then, since Q|G] satisfies the « chain condition, two of the corresponding ĝ; must 
be compatible. Then, since G is directed, there is some p stronger than any of these which 
lforces| this to be true, and therefore makes two elements of S compatible. 


Let S = (pi, Gidicn E P * Q. 


Claim: There is some p € P such that p | |{i | p; € G}| =« 


(Note: G = {(p,p) | p € P}, hence G[G] = G) 


If no p forces this then every p forces that it is not true, and therefore Ip {i | pi € G}| < «x. 
Since « is [regular] this means that for any generic G C P, {i | pi € G} is bounded] For 
each G, let f(G) be the least a such that 3 < a implies that there is some y > 8 such that 
p, E G. Define B = {a | a= f(G)} for some G. 


Claim: |B| < K 


If a € B then there is some pa € P such that p IH- f(@) = a, and if a, 3 € B then pa must 
be incompatible with pg. Since P satisfies the « chain condition, it follows that |B| < «. 


Since « is regular, a = sub(B) < «K. But obviously posi IF pasi € G. This is a contradiction, 
so we conclude that there must be some p such that p IF |{i | p; E G}| =k. 


If G C P is any generic subset containing p then A = {4;[G] | p; € G} must have [cardinality] 
k. Since Q[G] satisfies the « chain condition, there exist i, j < « such that p;, p; € G and 


there is some ĝ[G] € Q[G] such that q[G] < [G], q[G]. Then since G is directed, there is 
some p' € G such that p' < pj, pj, p and p' Ik q[G] < @i[G], @[G]. So (p',ĝ) < (pi, ĝi), (P3, G)- 
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34.120 equivalence of forcing notions 


Let P and Q be two [forcing] notions such that given any [generic|[subset] G of P there is a 
generic subset H of Q with IN|G] = M|[H] and vice-versa. Then P and Q are equivalent. 
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Since if G € MH], T[G] € M for any P-name 7, it follows that if G € IN[H] and H € MIG] 
then MIG] = MH]. 


Version: 2 Owner: Henry Author(s): Henry 


34.121 forcing relation 


If M is altransitivelmodel of|set theory|and P is alpartial order|then we can define a forcing 


relation: 
Pp IF p oln, bee at) 


(p forces (T1, ...,Tn)) 
for any p € P, where 71,...,7 are P- mames 
Specifically, the relation| holds if for every generic} filter G over P which D, 
MIG] E o(74[G],..-,™m|G]) 
That is, p forces ¢ if every extension of Mt by a generic filter over P containing p makes @ 
true. 


If p |-p @ holds for every p € P then we can write l-p ¢ tomeanthat for any generic G C P, 
M|G] E ¢. 
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34.122 forcings are equivalent if one is dense in the 
other 


Suppose P and Q are [forcing] notions and that f : P — Q is alfunction such that: 


e pı <p p2 implies f(pi) <q f(p2) 


e If pı, p2 € P are incomparable then f(p), f(p2) are incomparable 
© [P| isKianadin @ 


then P and Q are 
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Proof 


We seek to provide two operations (computable in the appropriate universes) which convert 
between of P and Q, and to prove that they are [inverses] 


F(G) = H where H is generic 


Given a generic G C P, consider H = {q | f(p) < q} for some p € G. 


If qı E H and qı S q then qo E H by the definition of H. If q,q2 E€ H then let pı, pọ E€ P 
be such that f(pı) < qı and f(p2) < q2. Then there is some p3 < pı, p2 such that p3 € G, 
and since f isfordetl preseving f(ps) < f(r.) <q and f(ps) < J (pa) < a 


Suppose D is a [dense] subset of Q. Since f[P] is dense in Q, for any d € D there is some 
p € P such that f(p) < d. For each d € D, assign (using the axiom of choice) some dp € P 
such that f(d,) < d, and call the set of these Dp. This is dense in P, since for any p € P 
there is some d € D such that d < f(p), and so some dp E Dp such that f(d,) < d. If dp < p 
then Dp is dense, so suppose d, £ p. If d, < p then this provides a member of Dp less than 
p; alternatively, since f(d,) and f(p) are compatible, dp and p are compatible, so p < dp, 
and therefore f(p) = f(d,) = d, so p € Dp. Since Dp is dense in P, there is some element 
p € Dp{)G. Since p € Dp, there is some d € D such that f(p) < d. But since p € G, 
d € H, so H intersects D. 


G can be recovered from F(G) 


Given H constructed as above, we can recover G as the set of p € P such that f(p) € H. 
[Obviously] every element from G is included in the new set, so consider some p such that 
f(p) € H. By definition, there is some p, € G such that f(p,) < f(p). Take some dense 
D € Q such that there is no d € D such that f(p) < d (this can be done easily be taking 
any dense subset and removing all such elements; the resulting set is still dense since there 
is some dı such that dı < f(p) < d). This set intersects f[G] in some q, so there is some 
p2 € G such that f(p2) < q, and since G is directed, some p3 E€ G such that p3 < po, pi. So 
f(s) < f(r1) < F(p). TF ps £ p then we would have p < ps and then f(p) < f(ps) < 4, 
contradicting the definition of D, so p3 < p and p € G since G is directed. 


F-'(H) = G where G is generic 


Given any generic H in Q, we define a corresponding G as above: G = {p € P | f(p) € H}. 
If pı € G and pı < pə then f(pı) € H and f(p1) < f(p2), so pọ € G since H is directed. If 
Pı, p2 € G then f(p1), f (p2) € H and there is some q € H such that q < f (p1), f (p2). 
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Consider D, the set of elements of Q which are f(p) for some p € P and either f(p) < q or 
there is no element greater than both f(p) and q. This is dense, since given any qı € Q, if 
qı S q then (since f[P] is dense) there is some p such that f(p) <<a <q. If q < q then 
there is some p such that f(p) <q < qu. If neither of these and q there is some r < qi, q then 
any p such that f(p) < r suffices, and if there is no such r then any p such that f(p) < q 
suffices. 


There is some f(p) € DAH, and so p € G. Since H is directed, there is some r < f(p),q, 
) 


so f(p) < q < f(p1), f(p2). If it is not the case that f(p) < f(pi) then f(p) = f(pi) = f(p2). 
In either case, we confirm that H is directed. 


Finally, let D be a dense subset of P. f[D] is dense in Q, since given any q € Q, there 
is some p € P such that p < q, and some d € D such that d < p < q. So there is some 
f(p) € FIDI NQ A, and so p € DANG. 


H can be recovered from F~!(H) 


Finally, given G constructed by this method, H = {q | f(p) < q} for some p € G. To see 
this, if there is some f(p) for p € G such that f(p) < q then f(p) € H soq € H. On the 
other hand, if q € H then the set of f (p) such that either f(p) < q or there is no r € Q such 
that r < q, f(p) is dense (as shown above), and so intersects H. But since H is directed, it 
must be that there is some f(p) € H such that f (p) < q, and therefore p € G. 
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34.123 iterated forcing 


We can define an iterated forcing of length a by linductionlas follows: 
Let Po = ø. 


Let Qo be a forcing) notion. 


For 6 < a, Pz is the set of all functions) f such that dom(f) C 8 and for any i € dom(f), 
f(i) is a P-name for a member of Q;. [Order] Pg by the rule f < g[iff'dom(g) C dom(f) and 
for any i € dom(f), g Til- f(t) <o, gli). (Translated, this means that any 
including g [restricted] to i forces| that f(z), an element of Q;, be less than g(i).) 


For 8 <a, Qs is a forcing notion in Pg (so IF p; Qe is a forcing notion). 


Then the sequence} (Qs) a<a is an iterated forcing. 


If Ps is restricted to [finite] functions that it is called a finite support iterated forcing 
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(FS), if Ps is restricted to [countable] functions, it is called a countable support iterated 
function (CS), and in general if each function in each Pg has size less than « then it is a 
< «-support iterated forcing. 


Typically we construct the sequence of Q,’s by induction, using a function F such that 


F((Qp)<») = Q,. 
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34.124 iterated forcing and composition 


There is a function satisfying forcings are equivalent if one is dense in the other f : Py * 


Qa > la+. 


Proof 


Let f((9,4)) = gU{ (a, â} }. This is[obviously]a member of P,+1, since it is alpartial function] 
from a+ 1 (and if theldomainlof g is less than a then so is the domain of f((g,q))), ifi < a 


then obviously f((g, @)) applied to i|satisfies| the definition of (since g does), 


and if 2 = a then the definition is satisfied since ĝ is amaméjin P; for a member of Qj. 


f is preserving, since if (g1,¢1) < (g2,@2), all the appropriate of a 
function carry over to the |image| and gı [ a lp, qi < 2 (by the definition of < in *). 


If (91,41) and (g2,q2) are incomparable then either gı and gp are incomparable, in which 
case whatever prevents them from being compared applies to their images as well, or q, and 
ĝo aren’t compared appropriately, in which case again this prevents the images from being 
compared. 


Finally, let g be any element of Pa+ı. Then g [a € Py. If a ¢ dom(g) then this is just g, 
and f((g,¢)) < g for any ĝ. If a € dom(g) then f((g [| a,g(a))) = g. Hence f[Py * Qa] is 
dense in P,1;, and so these are [equivalent] 
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34.125 name 


We need a way to refer to objects of M|G] within Mt. This is done by assigning a name to 
each element of M|G]. 


Given a P, we construct the P-names by Each name is just a 


273 


[elation] between P and the set of names already constructed; that is, a name is a set of 
ordered pairs| of the form (p, 7T) where p € P and 7 is a name constructed at an earlier [level] 


of the induction. 


Given a|generic subset|G C P, we can then define the T|G] of a P-name 7 in 
MG] by: 
T[G] = {r'[G] | (p, 7’) € T} for some p € G 


Of course, two different names can have the same interpretation. 


The generic subset can be thought of as a ” key” which reveals which potential elements of 
T are actually elements. 


Any element x € Mt can be given a canonical name 


This guarantees that the elements of ĉ|[G] will be exactly the same as the elements of z, 
regardless of which members of P are contained in G. 
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34.126 partial order with chain condition does not col- 
lapse cardinals 


If P is alpartial order| with satisfies the « chain condition and G is a [generic of P then 
for any k < À E€ M, A is also alcardinal] in M|G], and if cf(a) = A in M then also cf(a) = A 
in M|G]. 


This theorem is the simplest way to control a notion of since it means that a notion 
of forcing does not have an effect above a certain point. Given that any P satisfies the |P|* 


chain condition, this means that most forcings leaves all of Mt above a certain point alone. 


(Although it is possible to get around this [limit] by forcing with a [proper class}) 
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34.127 proof of partial order with chain condition does 
not collapse cardinals 


Outline: 


274 


Given any [function] f purporting to violate the theorem by being [surjective (or \cofinal) on 


A, we show that there are fewer than « possible values of f(a), and therefore only max(a, K) 
possible elements in the lentirejrange of f, so f is not surjective (or cofinal). 


Details: 
Suppose A > « is alcardinall of M that is not a cardinal in M[G]. 


There is some function f € IN[G] and some cardinal a < A such that f : a — A is surjective. 
This has a mama f. For each 8 < a, consider 


Fs = {y < À | pl- f(B) = 4} for some p € P 


|F3| < K, since any two p € P which [force] different values for f (8) are incompatible and P 
has no sets of incompatible elements of K. 


Notice that Fo is definable) in Wt. Then the range of f must be contained in F = Uia F;. 
But |F| < a- k = max(a, K) < à. So f cannot possibly be surjective, and therefore A is not 
collapsed. 


Now suppose that for some a > A> k, cf(a@) = A in M and for some 7 < A there is a cofinal 
function f : n > a. 


We can construct Fg as above, and again the range of f is contained in F = U,_, F; But 
then |range(f)| < |F| < 7-« < X. So there is some y < a such that f(3) < y for any 8 < n, 
and therefore f is not cofinal in a. 
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34.128 proof that forcing notions are equivalent to their 
composition 


This is a long and complicated proof, the more so because the meaning of Q shifts depending 
on what |generic|subset| of P is being used. It is therefore broken into a number of steps. The 
core of the proof is to prove that, given any generic subset G of P and a generic subset H of 
Q[G] there is a corresponding generic subset G* H of P xQ such that N[G][H] = MIG * H], 
and conversely, given any generic subset G of P * Q we can find some generic Gp of P and 


a generic Gg of Q[Gp] such that MIG p][Ge] = MIG]. 


We do this by constructing [functions] using operations which can be performed within the 
forced so that, for example, since IN[G][H] has both G and H, G x H can be 
calculated, proving that. it IM|G x H]. To ensure equality, we will also have to 
ensure that our operations are inverses} that is, given G, Gp * Gy = G and given G and H, 
(Gx H)p = P and (G x H)o = H. 
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The remainder of the proof merely defines the precise operations, proves that they give 
generic sets, and proves that they are inverses. 


Before beginning, we prove a lemma) which comes up several times: 


Lemma: If G is generic in P and D is dense above some p € G then 


GMNDA 


Let D' = {p' € P | p € DV p' is incompatible with p}. This is since if pọ € P 
then either po is incompatible with p, in which case pọ € D’, or there is some pı such that 
pı <p, po, and therefore there is some pz < pı such that pọ € D, and therefore pz < po. So 
G intersects D’. But since a generic set is directed, no two elements are incompatible, so 
G must contain an element of D’ which is not incompatible with p, so it must contain an 
element of D. 


Gx H is a generic filter 


First, given generic subsets G and H of P and QJ[G], we can define: 
G x H = {(p,q) |pEGAqG] € H} 


Gx» H is closed 


Let (pı, ĝı) € G * H and let (p1, ĝ1) < (pe, Go). Then we can conclude pı € G, pı < pao, 
[G] € H, and pı IF ĝi < qa, so pọ € G (since G is \closed) and q2[G] € H since pı € G and 
pı forces] both ĝi < ĝz and that H is downward closed. So (po, G2) € G * H. 


G « H is directed 


Suppose (p1, G1), (p1, 1) € G* H. So pı, po € G, and since G is directed, there is some p3 < 
Pi, p2- Since ĝu [G], g2|G] € H and H is directed, there is some q3|G] < â [G], G2[G]. Therefore 


there is some p4 < p3, pa € G, such that p4 IF ĝs < Gi, G2, so (p4, 9s) < (pı, ĝa), (pı, ĝi) and 
(pa, 43) € G* H. 


G x H is generic 


Suppose D is a dense subset of P « Ô. We can project it into a dense subset of Q using G: 
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Dg = {4[G] | (p,4) € D} for some p € G 


Lemma: Dg is dense in Q[G] 


Given any ĝ € Ô, take any po € G. Then we can define yet another dense subset, this one 
in G: 


Diy = {p | p < po A p IF G < Go A (p, @) € D} for some å € Q 


Lemma: D,, is dense above po in P 


Take any p € P such that p < po. Then, since D is dense in P * Q, we have some (p1, ĝi) < 
(p, ĝo) such that (pı, ĝ1) € D. Then by definition pı < p and pı € Dj. 


From this lemma, we can conclude that there is some pı < po such that pı € GN Dg,, and 
therefore some ĝ, such that p; I- ĝi < ĝo where (pi, G1) € D. So Dg is indeed dense in Q[G]. 


Since Dg is dense in Q[G], there is some @ such that â[G] € DoH, and so some p € G 
such that (p, ĝ) € D. But since p € G and g € H, (p, â) € G x H, so G x H is indeed generic. 


Gp is a generic filter 


Given some generic subset G of P x Q, let: 
Gp={peEP|p <p (p,q € G} for some p' € P and some g € Q 


Gp is closed 


Take any pı € Gp and any pz such that pı < pə. Then there is some p’ < pı satisfying the 
definition of Gp, and also p’ < po, so po € Gp. 


Gp is directed 


Consider pi, pọ € Gp. Then there is some p} and some @q; such that (pi, ĝi) € G and some 
ph and some gz such that (p5, G2) € G. Since G is directed, there is some (p3, g3) € G such 
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that (p3, 93) < (Pi, hh), (p3, qa), and therefore P3 E Gp, P3 < Pı, P2- 


Gp is generic 


Let D be a dense subset of P. Then D' = {(p,q) | p € D}. [Clearly] this is dense, since if 
(p, â) € P * Q then there is some p’ < p such that p' € D, so (p',ĝ) € D' and (p',ĝ) < (p, â). 
So there is some (p, ĝ) € D'()G, and therefore p € D(\Gp. So Gp is generic. 


Gg is a generic filter 


Given a generic subset G C P * Q, define: 


Go = {a[Ge] | (p, â) € G} for some p € P 


(Notice that Gg is dependant on Gp, and is a subset of ÂJG p], that is, the [forcing] notion 
inside IN|Gp], as opposed to the set of frames} Q which we’ve been primarily working with.) 


Gg is closed 


Suppose @i|Gp] € Gg and q@[Gp] < @[Gp]. Then there is some pı € Gp such that p, IF 
di < Go. Since pı € Gp, there is some py < pı such that for some q3, (p2, g3) E€ G. By the 
definition of Gg, there is some p such that (p3,¢1) € G, and since G is directed, there is 
some (p4, qa) € G and (p4, Gu) < (p3, Gi), (p2, G3). Since G is closed and (pa, G4) < (pa, Go), we 
have G2|G p] E Gg. 


Gq is directed 


Suppose qi[G'p], G2[Gp] E€ Gg. Then for some pı, p2, (pi, ĝa), (P2, 92) E G, and since G is 
directed, there is some (ps3, @3) € G such that (p3, ds) < (pı, G1), (P2, G2). Then q3[Gp] € Go 
and since p3 € G and ps IF ĝs < ĝi, G2, we have @3|Gp] < â [Gp], ĝ2[GPp]. 


Gg is generic 


Let D be a dense subset of Q[Gp] (in M[Gp]). Let D bea P-name for D, and let pı € Gp 
be a such that pı I- D is dense. By the definition of Gp, there is some pə < pı such that 
(p2, G2) E€ G for some q2. Then D’ = {(p,q) | pi- GE DAp < po}. 
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Lemma: D’ is dense (in G) above (pə, ĝ2) 


Take any (p,q) € P * Q such that (p, ĝ) < (pa, 2). Then p IF D is dense, and therefore 
there is some 3 such that p I- G3 € D and p IF q3 < q. So (p, gs) < (p,q) and (p, G3) € D’. 


Take any (p3, @3) € D'(|G. Then p3 € Gp, so ĝs € D, and by the definition of Ga, ĝs € Ga. 


Gp*Gg=G 


If G is a generic subset of P x Q, observe that: 


Gp *Gg = {(p,4) | p < pA Wd) E GA (po, G) € G} for some p', G’, po 


If (p,q) € G then obviously this holds, so G C Gp * Gg. Conversly, if (p,q) € Gp * Go 
then there exist p',ĝ' and po such that (p’,q’), (po, ĝ) € G, and since G is directed, some 
(pi, d) € G such that (p1, ĝi) < (p',7/), (po, g). But then pı < p and p; IF ĝi < q, and since 
G is closed, (p, g) € G. 


Assume that G is generic in P and H is generic in Q|G]. 


Suppose p € (G x H)p. Then there is some p' € P and some ĝ € Q such that p' < p and 
(p' â) € Gx H. By the definition of G « H, p' € G, and then since G is closed p € G. 


Conversely, suppose p € G. Then (since H is non-trivial), (p,q) € G x H for some ĝ, and 
therefore p € (G x H)p. 


(Gx H)o = H 


Assume that G is generic in P and H is generic in Q[G]. 


Given any q € H, there is some g € Q such that g|G] = q, and so there is some p such that 
(p,q) € G * H, and therefore g[G] € H. 


On the other hand, if q € (G * H)g then there is some (p, ĝ)} € G * H, and therefore some 
q|G] € H. 
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34.129 complete partial orders do not add small sub- 
sets 


Suppose P is a -complete partial order|in W. Then for any G, M [contains] 
no bounded! subsets of « which are not in M. 
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34.130 proof of complete partial orders do not add 
small subsets 


Take any x € M|G], x Cx. Let ĉ be a mame for x. There is some p € G such that 
pI- ĉ is a subset of x bounded by À < K 


Outline: 


For any q < p, we construct by induction) a|series| of elements qa stronger than p. Each qa 
will determine whether or not a € ĉ. Since we know the subset} is bounded] below «, we can 
use the fact that P is « [complete] to find a single element stronger than q which [fixes] the 
exact value of ĉ. Since the series is \definable| in Wt, so is ĉ, so we can conclude that above 
any element q < p is an element which [forces] € WM. Then p also forces ¢ € IN, completing 
the proof. 


Details: 


Since forcing) can be described within M, S = {q € P| q l- ê € V} is a set in M. Then, 
given any q < p, we can define q = q and for any qa (@ < A), dati is an element of P 
stronger than qa such that either qy41 F a+1€ @ or qapı l a+ 1¢ &. For[limit]a, let q4 


be any [upper bound) of gg for a < p (this exists since P is k-complete and a < 4), and let 
qa be stronger than q/, and either da41 IF @ E ĉ or qapı IF a ¢ &. Finally let g* be the 
upper bound of qa for a < À. g* € P since P is k-complete. 


Note that these elements all exist since for any p € P and any (first-order) [sentence] ¢ there 
is some q < p such that q forces either ġ or 7¢. 


q* not only forces that ĉ is a bounded subset of K, but for every lordinal] it forces whether or 
not that ordinal is contained in ¢. But the set {a < A | q* lk a € &} is defineable in M, and 
is of course equal to ĉ[G*] in any \generic| G* containing q*. So q* Ik ê € M. 


Since this holds for any element stronger than p, it follows that p I- < € Mt, and therefore 
£|G] E M. 
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34.131 © is equivalent to & and continuum hypothesis 
If S is a /stationary|subset) of k and \ < «K implies 2* < « then 


Os > hes 


Moreover, this is best possible: 4@g is [consistent] with #s. 
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34.132 Levy collapse 


Given any |cardinals| « and À in M, we can use the Levy collapse to give a new 
M[G] where A = «x. Let P = Levy(k, A) be the set of [partial functions] f : « > A with 
|dom(f)| < «. These each give partial about a function F which 
collapses A [onto] x. 


Given any |generic)subset|G of P, MIG] has a set G, so let F = UG. Each element of G is a 
partial function, and they are all compatible, so F is a function. dom(G) = « since for each 
a < k the set of f € P such that a € dom(f) is dense (given any function without a, it is 
trivial to add (a, 0), giving a stronger function which includes a). Also range(G) = A since 
the set of f € P such that a < A is in the range) of f is again dense (the domain] of each f is 
so if 8 is larger than any element of dom(f), f U{(8,a)} is stronger than f and 


includes A in its domain). 


So F is a surjective function from « to A, and A is collapsed in IN|G]. In addition, 
| Levy(k, A)| = A, so it|satisfies|the A*t [chain condition] and therefore \* is not collapsed, and 
becomes «* (since for any [ordinal] between \ and \* there is already a surjective function 
to it from àA). 


We can generalize this by [forcing] with P = Levy(k, < A) with A the set of partial 
functions f : A x k — A such that f(0,a) = 0, |dom(f)| < « and if a > 0 then f(a,i) < a. 
In essence, this is the lunion|of Levy(k,7) for each k < N < A. 


In MIG], define F = UG and F, (8) = F(a, 6). Each Fa is a function from « to a, and by 
the same argument as above Fa is both total and surjective. Moreover, it can be shown that 
P satisfies the À chain condition, so \ does not collapse and À = K+. 
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34.133 proof of © is equivalent to & and continuum 
hypothesis 


The proof that ©s implies both # 5 and that for every À < «K, 2* < « are given in the entries 
for Og and &es. 


Let A = (Ag)acs be a pequence] which satisfies] des. 
Since there are only « [bounded] |subsets! of k, there is a fino 


Bounded(«) x x where Bounded(«) is the bounded subsets of k. Define a sequence B = 
(Ba)acn by Ba = f(a) if sup(Ba) < a and Ø otherwise. Since the set of (Ba, A) E€ 
Bounded(«) x « such that Ba = T is unbounded] for any bounded subset T, it follow that 
every bounded subset of k occurs « times in B. 


We can define a new sequence, D = (Da)aes such that x € Da +> x € Bg for some 8 € Ag. 
We can show that D satisfies Og. 


First, for any a, x € Da means that x € Bg for some @ € Aa, and since Bg C B E Ay Ca, 
we have Da Ca. 


Next take any T C «K. We consider two cases: 
T is bounded 


The set of a such that T = Ba forms an unbounded sequence T’, so there is a 
S’ C S such that a € S’ — A, C T’. For each such a, x € Da + x € B; for some 
i € A, CT’. But each such B; is equal to T, so Da = T. 


T is unbounded 


We define a function j : k — «k as follows: 


e j(0) =0 


e To find j(a), take X (){j(G) | 8 < a}. This is a bounded subset of x, so is equal to 
an unbounded |series) of elements of B. Take j(a@) = y, where y is the least number 
greater than any element of {a} U{j(@) | 8 < a} such that B, = X (\{j(G) | 8 < a}. 


Let T’ = range(j). This is {obviously| unbounded, and so there is a stationary S’ C S such 
that a € S = A, CT". 


Next, consider C, the set of [ordinals] less than « [closed] under j. Clearly it is unbounded, 
since if \ < « then j(A) includes j(a@) for a < A, and so induction] gives an ordinal greater 
than A closed under j (essentially the result of applying j an infinite] number of times). Also, 
C is closed: take any c C C and suppose sup(c{)a) = a. Then for any 8 < a, there is some 
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y € c such that 8 < y < a and therefore j(3) < y. So a is closed under j, and therefore 
contained in C. 


Since C is a [club] C’ = CANS is stationary. Suppose a € C’. Then z € Da > x € Bg 
where 8 € Aa. Since a € S’, 8 € range(j), and therefore Bg C T. Next take any x E€ T Na. 
Since a € C, it is closed under j, hence there is some y € a such that j(x) € y. Since 
sup(A,) = a, there is some 7 € A, such that y < n, so j(x) € n. Since n € Ag, B, C Da, 
and since 7 € range(j), j(6) € Bẹ for any 6 < j~'(n), and in particular x € B,. Since we 
showed above that Da C a, we have Da = T Na for any a € C”. 
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34.134 Martin’s axiom 


For any k, Martin’s Axiom for k (MA,) that if P is a 
satisfying |cccjthen given any set of «\dense|subsetslof P, there is aldirected|subset intersecting 
each such subset. Martin’s Axiom states that MA, holds for every «k < 2°. 
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34.135 Martin’s axiom and the continuum hypothesis 


M Ay, always holds 


Given a countable collection of dense subsets of a partial order, we can selected a set (Dn) nia 


such that pn is in the n-th dense subset, and pry1 < Pn for each n. Therefore CH implies 
MA. 


If MA, then 2? > «k, and in fact 2" = 2% 


K >No, so 2" > 20, hence it will suffice to find an from P(No) to P(x). 
Let A = (Ag) aces a [Sequence] of linfinitel subsets of w such that for any a # 8, Aa (Ag is 


nite 


Given any subset S C « we will construct a function f : w — {0,1} such that a unique S can 
be recovered from each f. f will have the that if i € S then f(a) = 0 for finitely 
many elements a € A;, and if i ¢ S then f(a) = 0 for infinitely many elements of Aj. 


Let P be the partial order (under such that each element p € P 
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p is a|partial function) from w to {0, 1} 


e There exist 71,...,%, E S such that for each j < n, Aj, C dom(p) 


e There is a finite subset of w, wp, such that wp = dom(p) — U;<n Ai, 


For each j < n, p(a) = 0 for finitely many elements of A;, 


This satisfies (ccc) To see this, consider any funcountable| sequence S = (pa)acw, of elements 
of P. There are only countably many finite subsets of w, so there is some w C w such that 
w = Wp for uncountably many p € S and p | w is the same for each such element. Since each 
of these function’s only a finite number of the Aa, and is 1 on all but a finite 
number of elements in each, there are only a countable number of different 
available, and therefore two of them are compatible. 


Consider the following groups] of dense subsets: 


e D, = {p E€ P |n € dom(p)} for n < w. This is dense since any p not already 
in D, can be extended to one which is by adding (n, 1) 


e Da = {p € P | dom(p) D Aa} for a € S. This is dense since if p ¢ Da then 
pU{(a, 1) | a € Aa \ dom(p)} is. 


e For each a ¢ S, n < w, Dna = {p E P | m > n ^Ap(m) = 0} for some m < w. This 
is dense since if p ¢ Dna then dom(p) N Aa = Aa N (wr UU; Ay): But wy, is finite, 
and the of A, with any other A; is finite, so this intersection is finite, 
and hence by some m. Ag is infinite, so there is some m < x € Ag. So 
pU{ (z, 0)} € Dna- 


By MA,,, given any set of x dense subsets of P, there is a|generic|G which intersects all of 
them. There are a total of Ng + |S| + (x — |S|) - Xo = « dense subsets in these three groups, 
and hence some generic G intersecting all of them. Since G is directed, g = UG is a partial 
function from w to {0,1}. Since for each n < w, G N Dn is non-empty, n € dom(g), so g is 
a {total function) Since GA Da for a € S is non-empty, there is some element of G whose 
domain [contains] all of A, and is 0 on a finite number of them, hence g(a) = 0 for a finite 
number of a E€ Aa. Finally, since G (N Dna for each n < w, a € S, the set of n € Aa such 
that g(n) = 0 is unbounded) and hence infinite. So g is as promised, and 2" = 2%, 
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34.136 Martin’s axiom is consistent 


If k is an strong limit cardinal] such that for any \ < «x, kò = «k then it is 
[konsistenti that 28° = «k and [MA] This is shown by using finite support iterated forcing] to 
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construct. a {modell of ZFC) in which this is true. Historically, this proof was the motivation 
for developing 


Outline 


The proof uses the convenient fact that MA, holds as long as it holds for all partial orders| 
smaller than «. Given the conditions on «, there are at most «(nameés)for these partial orders. 
At each step in the we force with one of these names. The result is that the actual 
we add intersects every {dense| subset of every partial order. 


Construction of P; 


Âa will be constructed by {induction| with three conditions: |P,| < « for alla < «, Ikp, Qa C 
M, and P, \satisfies| the (ccc) Note that a partial ordering on a [cardinal] À < « is a [function] 
from A x À to {0,1}, so there are at most 2* < « of them. Since a [canonical name for a 
partial ordering of a cardinal is just a function from P, to that cardinal, there are at most 
Ke < k of them. 


At each of the & steps, we want to deal with one of these possible partial orderings, so we 
need to|partition| the « steps in to « steps for each of the « cardinals less than «. In addition, 
we need to include every P, name for any level! Therefore, we partion « into ($y.5)1,6<x for 
each cardinal 6, with each S}, having |cardinality|« and the added condition that n € S,,5 
implies 7 > y. Then each P, name for a partial ordering of ô is assigned somelindexln € S45, 
and that partial order will be dealt with at stage Q}. 


Formally, given Qe for 8 < a, Pa can be constructed and the P, names for partial orderings 
of each cardinal 6 enumerated by the elements of Sa. a E Sy for some Ya and ða, and 
a > Ya SO some canonical P,, name for a partial order <, of dq has already been assigned 
to a. 


Since <, isa P,, name, it is also a P, name, so Âa can be defined as (da, <a) if Fp, (da, <a) 
satisfies the ccc and by the trivial partial order (1, {(1,1)}) otherwise. [Obviously] this 
satisfies the ccc, and so Py; does as well. Since Ôa is either trivial or a cardinal together 
with a canonical name, Ikp, Qa C M. Finally, |Pa44| < X, la|” - (sup; lQ:|)" < k. 


Proof that MA, holds for \ < «x 
Lemma: It suffices to show that MA, holds for partial order with size < à 


S uppose P is a partial order with |P| > « and let (Da)a<, be dense subsets of P. Define 
functions fi; : P —> Da for ak with fa(p) > p (obviously such elements exist since Da is 
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dense). Let g: P x P — P bea function such that g(p,q) > p,q whenever p and q are 
compatible. Then pick some element q € P and let Q be the |closure| of {q} under fa and g 
with the same ordering as P (restricted to Q). 


Since there are only « functions being used, it must be that |Q| < «K. If p E€ Q then falp) 2 p 
and clearly fa(p) € QN Da, so each Da f] Q is dense in Q. In addition, Q is ccc: if A is an 
lantichain| in Q and pı, p2 € A then pj, pz are incompatible in Q. But if they were compatible 
in P then g(pi, p2) > pı, p2 would be an element of Q, so they must be incompatible in P. 
Therefore A is an antichain in P, and therefore must have cardinality, since P 
satisfies the ccc. 


By assumption, there is a directed G C Q such that GAN(Da NQ) 4 Í for each a < «k, and 
therefore MA) holds in full. 


Now we must prove that, if G is a generic subset of P, R some partial order with |R| < à 
and (Dqa)a<, are dense subsets of R then there is some directed subset of R intersecting each 
Da. 


If |R| < à then à additional elements can be added greater than any other element of R to 


make |R| = à, and then since there is an into some partial order of À, 
assume R is a partial ordering of A. Then let D = { (a, 3) | a € Dg}. 


Take canonical names so that R = R[G], D = D[G] and D; = D,{G] for each i < À and: 


IF p, Risa partial ordering satisfying ccc and 
DCXAxXX and 
Da is dense in R 


A 


For any a, there is a Dag C P, such that if p € Dag then either 
p lFp, @ Sg B or p lFp, a £k @ and another maximal antichain Fap C P, such that if 
p € Eag then either p Ikp, (a, 3) € D or p lHp, (a, 3) ¢ D. These antichains determine the 
value of those two formulas, 


Then, since KÉ“ > k and K” = «k for u < k, it must be that cf K = K, so K is [regular] Then 
y =sup({a+1| a E€ dom(p),p E€ Ua ge, Dae U Eag) < £, 80 Das, Eag C Py, and therefore 


the P, names R and D are also P, names. 


Lemma: For any y, G, = {p | y | p € G} is a generic subset of P, 
F irst, it is directed, since if pı | y,p2 | y E€ G, then there is some p € G such that 
P < pı, p2, and therefore p | y € Gy and p < pı | 7, p2 | 7- 


Also, it is generic. If D is a dense subset of P, then D, = {p € Pk | p <q € D} is dense in 
P,,, since if p € P,, then there is some d < p |, but then d is compatible with p, sod p € D,. 
Therefore there is some p € D,,(|G,, and so p [€ D NG}. 
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Since R and D are P, names, R[G] = R[G,] = R and D[G] = DIG] = D, so 


V [G] E Ê is a partial ordering of \ satisfying the ccc and 
Da is dense in R 


Then there must be some p € G, such that 


plFp, Risa partial ordering of \ satisfying the ccc 


Let A, be a maximal antichain of P, such that p € Ap, and define g“ asa P, name with 
(p,m) € <° for each m € Ê and (a,n) € < if n = (a, 6) where a < 8 < À and p £a € Ap. 
That is, <[G] = R when p € G and < [G] =€} A otherwise. Then this is the name for a 
partial ordering of A, and therefore there is some 7 € Sy, such that <= on and ù > J. 


A 


Since p € Gy C Gy, QalGnl = SqlGn] = R. 


Since P +1 = P,*Q,, we know that Gg, C Qq is generic since forcing with the composition is equivalent to | 


Since D; € V[G,] C V[G,] and is dense, it follows that Dj(]/Gg, # @ and since Gg, is a 
subset of R in P,, MA) holds. 


Proof that 2% = «x 


The relationship between [Martin’s axiom and the continuum hypothesis| tells us that 2% > 


k. Since 2% was less than « in V, and since |P,| = k adds at most « elements, it must be 
that 2% = ķ. 
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34.137 a shorter proof: Martin’s axiom and the con- 
tinuum hypothesis 


This is another, shorter proof for the fact that MAy, always holds. 


Let (P, <) be a|partially ordered set|and D be a collection! of subsets] of (P, <). We remeber 
that alfilter|G on (P, <) is D-generic if GQ D £ Ø for all D € D which are dense in (P, <). 
(” dense” in this context means: If D is dense in (P, <), than for every p € P there’s ad € D 
such that d < p. 


Let (P, <) be a partially ordered set and D alcountable|collection of dense subsets of P, then 
there exists a D-generic filter Œ on P. Moreover, it could be shown, that for every p € P 
there’s such a D-generic filter G with p € G. 
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L et Di,...,Dn,... be the dense subsets in D. Furthermore let pp = p. Now we can choose 
for every 1 < n < w an element pn € P such that pn < pn-1 and pn € Dn. If we now consider 
the set G:= {q E P : 3 n < ws.t. pn < q}, than it is easy to check that G is a D-generic 


filter on P and p € G [obviously] This [completes] the proof. 
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34.138 continuum hypothesis 


The Continuum Hypothesis states that there is nolcardinal number|« such that No < k < 2°, 


An [equivalent] statement is that N, = 2%, 
It is known to be independent of the axioms) of 


The continuum hypothesis can also be stated as: there is no subset) of the 
which has [cardinality] strictly between that of the reals and that of the It is from 


this that the mamei comes, since the set of real numbers is also known as the continuum. 
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34.139 forcing 


Forcing is the method used by Paul Cohen to prove the|independence|of the continuum hypothesis 
(CH). In fact, the method was used by Cohen to prove that CH could be violated. The treat- 


ment I give here is VERY informal. I will develop it later. First let me give an example from 


Suppose we have a [field] k, and we want to add to this field an element a such a? = —1. 
We see that we cannot simply drop a new a in k, since then we are not guaranteed that we 
still have a field. Neither can we simply assume that k already has such an element. The 
standard way of doing this is to start by adjoining a|generic| indeterminate X, and impose a 
constraint on X, saying that X? +1 = 0. What we do is take the quotient k[X]/(X? + 1), 
and make a field out of it by taking the quotient field] We then obtain k(a), where a is the 
equivalence class| of X in the quotient. The general case of this is the theorem of algebra 
saying that every p over a field k has a/root|in some |extension field] 


We can rephrase this and say that “it is consistent with standard field theory that —1 have 
a square root”. 


When the [theory] we consider is ZFC, we run in exactly the same problem : we can’t just 
add a “new” set and pretend it has the required because then we may violate 
something else, like foundation. Let M be a transitive) model] of which we call 
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the ground model. We want to “add a new set” S to M in such a way that the extension 
M has M as a subclass, and the properties of M are preserved, and S € M’. 


The first step is to “approximate” the new set using elements of M. This is the analogue of 
finding the lirreducible| polynomial in the algebraic example. The set P of such “approxima- 
tions” can be ordered by how much {information| the approximations give : let p,q € P, then 
p <q if and only if p “is stronger than” q. We call this set a set of forcing conditions. 
Furthermore, it is required that the set P itself and the be elements of M. 


Since P is a partial order, some of its subsets) have interesting properties. Consider P as 


a topological space with the order topology, A subset D C P is \dense in P if and only if 
for every p € P, there is d € D such that d < p. A filter|in P is said to be M-generic if 


and only if it intersects every one of the dense subsets of P which are in M. An M-generic 
filter in P is also referred to as a generic set of conditions in the literature. In general, 
eventhough P is a set in M, generic filters are not elements of M. 


If P is a set of forcing conditions, and G is a generic set of conditions in P, all in the ground 
model M, then we define M[G] to be the least model of ZFC that |contains|G. In forthcoming 
entries I will detail the construction of M[G]. The big theorem is this : 


Theorem 5. M[G] is a model of ZFC, and has the same\ordinals as M, and M C M{G]. 


The way to prove that we can violate CH using a generic extension is to add many new 
“subsets of w” in the following way : let M be a transitive model of ZFC, and let (P, <) be 
the set (in M) of all [functions] f whose domainlis a|finite| subset of Na x No, and whose frange] 
is the set {0,1}. The ordering here is p < q if and only if p D q. Let G be a generic set of 
conditions in P. Then (JG is altotal function] whose domain is Nz x No, and range is {0, 1}. 
We can see this f as coding Na new functions fa : Xo —> {0,1}, a < N2, which are subsets of 
omega. These functions are all dictinct, and so CH is violated in M[G]. 


All this relies on a proper definition of the in M[G], and the forcing relation} 
which will come in a forthcoming entry. Details can be found in Thomas Jech’s book Set 
Theory. 
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34.140 generalized continuum hypothesis 

The generalized continuum hypothesis that for any A there is no 
cardinal « such that A < K < 2>. 

Equivalently, for every [ordinal a, Nai; = 2%. 


Like the continuum hypothesis, the generalized continuum hypothesis is known to belindependent| 
of the of [ZFC] 
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34.141 inaccessible cardinals 


A (limit cardinal] x is a|strong limit cardinal]if for any A <n, 2 < «. 


A regular] limit cardinal « is called weakly inaccessible, and a regular strong limit cardinal 
is called inaccessible. 
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34.142 © 


© s is a combinatoric principle regarding a stationary set S C «K. It holds when there is a 
sequence] (Aq)acs such that each A, C a and for any A C k, {a € S | ANa = Ag} is 


To get some sense of what this means, observe that for any A < «K, {A} C &, so the set of 
Aa = {A} is stationary (in x). More strongly, suppose « > A. Then any |subset] of T C A is 
[bounded] in k so A, = T on a stationary set. Since |S| = «, it follows that 2* < «. Hence 
©x,, the most common form (often written as just ©), implies CH) 
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34.143 & 


# > is a combinatoric principle weaker] than Og. It \states| that, for S in «K, there 


is a Sequence] (Aa)acs such that Aa C a and sup(A,) = a and with the that for 
each unbounded|/subset| T C «K there is some A, C X. 


Any sequence satisfying © can be adjusted so that sup(A,) = a, so this is indeed a weakened 
form of Og. 


Any such sequence actually a of a such that A, C T for each T: 
given any C and any unbounded T, construct a «k sequence, C* and T*, from the 
elements of each, such that the a-th member of C* is greater than the a-th member of T*, 
which is in turn greater than any earlier member of C*. Since both sets are unbounded, this 
construction is possible, and 7* is a subset of T still unbounded in «. So there is some a 
such that A, C T*, and since sup(Aq) = a, a is also the limit! of a subsequence) of C* and 
therefore an element of C. 
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34.144 Dedekind infinite 


A set A is a said to be Dedekind infinite if there is an f :w— A, where 
w denotes the set of 


A Dedekind infinite set is certainly infinite] and if the [axiom of choice is assumed, then an 
infinite set is Dedekind infinite. However, it is [consistent] with the failure of the axiom of 
choice that there is a set which is infinite but not Dedekind infinite. 
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34.145 Zermelo-Fraenkel axioms 


Equality of sets: If X and Y are sets, and x € X ifflx € Y, then X = Y. Pair set: If X 
and Y are sets, then there is a set Z containing only X and Y. [Union over a set: If X is a 


set, then there exists a set that|{contains|every element of each x € X. 


If X is a set, then there exists a set P(x) with the|property that Y € P(x) iff any element 
y € Y is also in X. Replacement Let F(x,y) be some formula) If, for all x, there 


is exactly one y such that F(x,y) is true, then for any set A there exists a set B with the 
property that b € B iff there exists some a € A such that F'(a, b) is true. regularity axiom: 


Let F(x) be some formula. If there is some x that makes F(x) true, then there is a set Y 
such that F(Y) is true, but for no y € Y is F(y) true. Existence of an There 


exists a non-empty set X with the property that, for any x € X, there is some y € X such 
that x C y but x # y. Ernst Zermelo and Abraham Fraenkel proposed these axioms as a 


foundation for what is now called Zermelo-Fraenkel set theory, or ZF. If these axioms are 


accepted along with the laxiom of choice, it is often denoted ZFC. 
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34.146 class 


By a class in modern we mean an arbitrary |collection| of elements of the {universe} 


All sets are classes (as they are collections of elements of the universe - which are usually 
sets, but could also be urelements), but not all classes are sets. Classes which are not sets 
are called proper classes. 
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The need for this distinction arises from the paradoxes of the so called naive set theory. In 
naive set theory one assumes that to each possible division of the universe into two (disjoint) 
and mutually comprehensive |parts| there corresponds an entity of the universe, a set. This is 
the contents of Frege’s famous fifth [axiom] which |states| that to each [second order’ predicate 
P there corresponds a first order] object p called the lextension| of P, s.t. Va(P(r) = x € p). 
(Every predicate P divides) the universe into two mutually comprehensive and disjoint parts; 
namely the part which consists of objects for which P holds and the part consisting of objects 
for which P does not hold). 


Speaking in modern terms] we may view the situation as follows. Consider a [model] of set 
theory M. The|interpretation| the model gives to € defines implicitly alfunction| f : P(M) > 
M. Seen this way, the fact that not all classes can be sets simply means that we can’t 
injectively map the powerset of any set into the set itself, which is a famous result by 
Cantor. Functions like f here are known as extensors and they have been used in the study 
of of set theory. 


Russell’s paradox - which could be seen as a proof of Cantor’s theorem about \cardinalities 


of powersets - shows that Frege’s fifth axiom is contradictory; not all classes can be sets. 


From here there are two traditional ways to proceed: either trough the theory of types or 
trough some form of limitation of size principle. 


The limitation of size principle in its vague form says that all small classes (in the sense of 
cardinality) are sets, while all proper classes are very big; “too big” to be sets. The limitation 
of size principle can be found in Cantor’s work where it is the/basis|for Cantor’s doctrine that 
only transfinite collections can be thought as specific objects (sets), but some collections are 
“absolutely infinite”, and can’t be thought to be comprehended into an object. This can be 
given a precise formulation: all classes which are of the same cardinality as the 
class are too big, and all other classes are small. In fact, this formulation can be used 


in von Neumann-Bernays-Godel set theory to replace the replacement axiom and almost _a 


other set existence axioms (with the exception of the powerset axiom). 


The limitation of size principle can be seen to give rise to extensors of P<l4l(A) = A. 
(P<|4|(A) is the set of all{subsets]of A which are of cardinality less than that of A). This is 
not the only possible way to avoid Russell’s paradox. We could use an extensor according 
to which all classes which are of cardinality less than that of the universe or for which the 
cardinality of their is less than that of the universe are sets (i.e. map into 
elements of the model). 


In many set theories there are formally no proper classes; [ZFC] is an example of just such a 
set theory. In these|theories|one usually means by a proper class anjopen| formulal ®, possibly 
with set parameters qj, ...,@,. Notice, however, that these do not exhaust all possible proper 
classes that should “really” exist for the universe, as it only allows us to deal with proper 
classes that can be defined by means of an open formula with parameters. The theory NBG 
formalises this usage: it’s conservative over ZFC (as [clearly] speaking about open formulae 
with parameters must be!). 
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There is a set theory known as Morse-Kelley set theory which allows us to speak about and 
to quantify over an extended class of impredicatively defined porper classes that can’t be 
reduced to simply speaking about open formulae. 
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34.147 complement 


Let A be alsubset! of B. The complement of A in B (denoted A? when the larger set B is 
clear from context) is the |set_difference] B \ A. 
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34.148 delta system 


If S is a set of sets then it is a A-system if there is some (possibly empty) X such 
that for any a,b € S, if a # b then af)b=X. 
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34.149 delta system lemma 


If S is a set of [finite] sets such that |S] = N4 then there is a S’ C S such that |,S”| = N; and 
Sisa 
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34.150 diagonal intersection 

If (S;)i<q is a pequence] then the diagonal intersection, A;<aS; is defined to be {8 < a | 
p (= gre Sy} 


That is, 8 is in A;-,S; if it is contained in the first 86 members of the sequence. 
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34.151 intersection 


The intersection of two sets A and B is the set that [contains] all the elements x such that 
x € A and z € B. The intersection of A and B is written as A f B. 


Example. If A = {1, 2,3,4,5} and B = {1,3,5,7,9} then AQB = {1,3,5}. 


We can define also the intersection of an arbitrary number of sets. If {A;}j<, is a family of 
sets we define the intersection of all them, denoted f) <z Aj, as the set consisting in those 
elements belonging to all sets A;: 


() Aj = {a € Aj: for all j € J}. 


jet 
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34.152 multiset 


A multiset is a/set)for which duplicate elements are allowed. 
For example, {1,1,3} is a multiset, but not a set. 
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34.153 proof of delta system lemma 


Since there are only Nog possible for any element of S, there must be some 
n such that there are an number of elements of S with cardinality n. Let 
S* = {a € S | |a| = n} for this n. By linduction} the [lemma] holds: 


If n = 1 then there each element of S* is distinct, and has no [intersection] with the others, 
so X =@ and S’ = S*. 


Suppose n > 1. If there is some x which is in an uncountable number of elements of S* then 
take S** = {a\ {x} | x € a € S*}. |Obviously| this is uncountable and every element has n— 1 
elements, so by the induction hypothesis there is some S’ C S** of uncountable cardinality 
such that the intersection of any two elements is X. Obviously {aU{x} | a € S’} 
the lemma, since the intersection of any two elements is X U{z}. 


On the other hand, if there is no such x then we can construct a Sequence} (a;);<., such that 
each a; € S* and for any i 4 j, a;()a; = Ú by induction. Take any element for ao, and 
given (a;)ica, Since a is A=U, <a i is countable. Obviously each element of 
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A is in only a countable number of elements of S*, so there are an uncountable number of 
elements of S* which are candidates for a,. Then this sequence satisfies the lemma, since 
the intersection of any two elements is 0. 
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34.154 rational number 


The rational numbers Q are the fraction field) of the ring) Z of In more elementary 
a rational number is a quotient a/b of two integers a and b. Two |fractions| a/b and 


c/d are equivalent] if the product of the cross terms is equal: 


; = < ad = bc 
Addition and multiplication of fractions are given by the formulae 

a a c  ad+bce 

d bd 

c ac 
d bd 
The field] of rational numbers is an [ordered field) under the [ordering relation} a/b < c/d if 
the a: d X b-c holds in the integers. 
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34.155 saturated (set) 


If p: X — Y is a|surjectivemap, we say that alsubset|C C X is saturated (with respect to 
p) if C [containslevery set p~'({y}) it intersects. Equivalently, C is saturated if it is a [union] 


of 
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34.156 separation and doubletons axiom 


e Separation axiom : If X is a set and P is a condition on sets, there exists a set 
Y whose members are precisely the members of X satisfying P. Common hotation! 
Y = {A € X||P(A)}. 


e Doubletons axiom (or Pairs): If X and Y are sets there is a set Z whose only 
members are X and Y. Common notation: Z = {X,Y}. 
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34.157 set 


34.157.1 Introduction 


A set is alcollection| group, or conglomerate [| 


Sets can be of “real” objects or mathematical objects; but the sets themselves are purely 
conceptual. This is an important point to note: the set of all cows (for example) does not 
physically exist, even though the cows do. The set is a “gathering” of the cows into one 
conceptual unit that is not [part] of physical reality. This makes it easy to see why we can 
have sets with an |infinite| number of elements; even though we may not be able to point out 
infinity] ob jects in the real] world, we can construct conceptual sets which an infinite number 
of elements (see the examples below). 


Mathematics is thus built upon sets of purely conceptual, or mathematical, objects. Sets 
are usually denoted by upper-case roman letters (like S). Sets can be defined by listing the 
members, as in 


S = {a,b,c,d} 


Or, a set can be defined from a formula, This of statement defining a set is of the form 


S = {x : P(x)} 


where S is the symbol denoting the set, x is the variable we are introducing to [represent] a 
generic| element of the set, and P(x) is some|property|that is true for values x within S (that 
is x € S [iffl P(x) holds). (We denote “and” by comma separated clauses in P(x). Also note 
that the x : portion of the set definition may contain a qualification which narrows values of 
x to some other set which is already known). 


Sets are, in fact, completely defined by their elements. If two sets have the same elements, 


they are This is called the axiom of extensionality, and it is one of the most 
important characteristics] of sets that distinguishes them from predicates or properties. 


1 However, not every collection has to be a set (in fact, all collections can’t be sets). See for 
more details. 
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The symbol € denotes inclusion in a set. For example, 


ses 


would be read “s is an element of S”, or “S contains s”. 


Some examples of sets, with formal definitions, are : 


e The set of all [even integers]: {x € Z : 2 | x} 


e The set of all [prime numbers} {p E€ N : Yz E N «z |p => x € {1,p}}, where > denotes 
implies and | denotes |divides, 


e The set of all real functions] of one real parameter: {f (x) € R : x € R} 


e The set of all {AABC : (AB = BC) #4 AC}, where overline 
denotes length. 


Z, N, and R are all standard sets: the lintegers, the natural numbers) and the real numbers, 


respectively. These are all infinite sets. 


The most basic set is the empty set] (denoted Ø or {}). 


The astute reader may have noticed that all of our examples of sets utilize sets, which does 
not suffice for rigorous definition. We can be more rigorous if we postulate only the empty 
set, and define a set in general as anything which one can construct from the empty set and 


the 


All objects in modern mathematics are constructed via sets. 


34.157.2 Set Notions 


An important set notion is Cardinality is roughly the same as the intuitive 
notion of “size”. For sets which have a less than infinite (non-infinite) number of elements, 
cardinality can be thought of as size. However, intuition breaks down for sets with an infinite 
number of elements. For more detail, see the cardinality entry. 


Another important set concept is that of subsets. A subset B of a set A is any set which 
contains only elements that appear in A. Subsets are denoted with the C symbol, i.e. B C A. 
Also useful is the notion of a proper subset, denoted B C A, which adds the restriction 
that B must be smaller than A (that is, have a lower cardinality). 
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34.157.3 Set Operations 


There are a number of standard (common) operations which are used to manipulate sets, 
producing new sets from of existing sets (sometimes with entirely different 
types of elements). These standard operations are: 


e cartesian product 
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Chapter 35 


O3Exx — Set theory 


35.1 intersection of sets 


Let X,Y be sets. The intersection] of X and Y, denoted X NY is the set 


X()¥ ={2:2€ X,z€Y} 
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Chapter 36 


O3F03 — Proof theory, general 


36.1 NJp 


Ndp is a natural deduction) proof system for intuisitionistic propositional logic, Its only 


axiom is a = a for any atomic a. Its rules are: 


Tsa T>a adseo ILES o 


Tr>=ave rena T, £, I| Soe (E) 


The [Syntax] a° indicates that the rule also holds if that [formula] is omitted. 


rsa 438 T>aAf 
| Sa 2 (Ad) r>a rap 
ra= p T=a>ß LXsa 
Tsa O0) — pgs CP 
r= 


where a is atomic(L;) 


r>a 
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36.2 NKp 


NKp is a natural deduction proof system for classical propositional logic, It is identical to 
except that it replaces the rule L; with the rule: 
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r, =a => L 


where a is atomic(Le 
r>a (Le) 
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36.3 natural deduction 


Natural deduction refers to related proof systems for several different kinds of logic] intended 
to be similar to the way people actually reason. Unlike many other proof systems, it has 
many rules and few [Sequents| in natural deduction have only one [formula on the 
right side. 


Typically the rules consist of one pair for each connective, one of which allows the introduc- 
tion of that symbol and the other its elimination. 


To give one example, the proof rules — J and —> E are: 


lop 


T>a-6 nd) 


and 
TS>a-6 “>a 


IDEJ: Sa 
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36.4 sequent 


A sequent |[represents]a formal step in a proof. Typically it consists of two lists of 


one representing the premises and one the conclusions. A typical sequent might be: 


oY =a, p 


This claims that, from premises ¢ and w either a@ or 8 must be true. Note that => is not 
a symbol in the language, rather it is a symbol in the metalanguage used to discuss proofs. 
Also, notice the asymmetry: everything on the left must be true to conclude only one thing 
on the right. This does create a different kind of since adding formulas to either 
side results in a sequent, while removing them from either side gives a stronger one. 


Some systems allow only one formula on the right. 
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Most proof systems provide ways to deduce one sequent from another. These rules are 
written with a list of sequents above and below a line. This rule indicates that if everything 
above the line is true, so is everything under the line. A typical rule is: 


r=. 
Dasr arst 


This indicates that if we can deduce £ from I’, we can also deduce it from I together with 
Q. 


Note that the capital greek letters are usually used to denote a (possibly empty) list of 
formulas. [[, X] is used to denote the contraction of [ and X, that is, the list of those 
formulas appearing in either or X but with no repeats. 
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36.5 sound,, complete 


If Th and Pr are two sets of facts (in particular, a [theory] of some language and the set of 
things provable by some method) we say Pr is sound for Th if Pr C Th. Typically we 


have a theory and set of rules for constructing proofs, and we say the set of rules are sound 
(which theory is intended is usually clear from context) since everything they prove is true 
(in Th). 


If Th C Pr we say Pr is complete for Th. Again, we usually have a theory and a set of 
rules for constructing proofs, and say that the set of rules is complete since everything true 
(in Th) can be proven. 
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Chapter 37 


O3F07 — Structure of proofs 


37.1 induction 


Induction is the name given to a certain kind of proof, and also to a (related) way of defining 
alfunction| For a proof, the statement to be proved has a suitably ordered set of cases. 
Some cases (usually one, but possibly zero or more than one), are proved separately, and 
the other cases are deduced from those. The deduction goes by contradiction, as we shall 
see. For a function, its domain is suitably ordered. The function is first defined on some 
(usually nonempty) subset! of its domain, and is then defined at other points x in terms of 
its values at points y such that y < zx. 


37.1.1 Elementary proof by induction 


Proof by induction is a variety of proof by contradiction, relying, in the elementary cases, 
on the fact that every non-empty set of natural numbers! has a [least element! Suppose we 


want to prove a statement F'(n) which involves a natural number n. It is enough to prove: 
1) If n € N, and F(m) is true for all m € N such that m < n, then F'(n) is true. 

or, what is the same thing, 

2) If F(n) is false, then F'(m) is false for some m < n. 


To see why, assume that F'(n) is false for some n. Then there is a smallest k € N such that 
F(k) is false. Then, by hypothesis, F (n) is true for all n < k. By (1), F'(k) is true, which is 
a contradiction. 


(If we don’t regard induction as a kind of proof by contradiction, then we have to think 
of it as supplying some kind of of proofs, of unlimited length. That’s not very 
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satisfactory, particularly for transfinite inductions, which we will get to below.) 


Usually the initial case of n = 0, and sometimes a few cases, need to be proved separately, 
as in the following example. Write Ba = $X }—o k?. We claim 


3 2 
By = +> +> for allneN 
Let us try to apply (1). We have the inductive hypothesis (as it is called) 


m? m m 


Brg = dor all <i 


which tells us something if n > 0. In particular, setting m = n — 1, 


(n—=1) (n-1)? n-1 
Bas eis ee i 
; ; a " 6 


Now we just add n? to each side, and verify that the right side becomes 7 + 7 +5- This 
proves (1) for nonzero n. But if n = 0, the inductive hypothesis is[vacuously true, but of no 


use. So we need to prove F'(0) separately, which in this case is trivial. 


Textbooks sometimes distinguish between weak and strong (or complete) inductive proofs. 
A proof that relies on the inductive hypothesis (1) is said to go by strong induction. But in 
the sum-of-squares formula above, we needed only the hypothesis F (n — 1), not F'(m) for all 


m <n. For another example, a proof about the [Fibonacci sequence] might use just F'(n — 2) 


and F(n — 1). An argument using only F'(n — 1) is referred to as weak induction. 


37.1.2 Definition of a function by induction 


Let’s begin with an example, the function N —> N, n + a”, where a is some > 0. 
The inductive definition reads 


a? = 1 
a” = a(a”") forall n > 0 


Formally, such a definition requires some justification, which runs roughly as follows. Let T 
be the set of m € N for which the following definition ”has no problem”. 


a? = 1 

a” = ala) for0O<n<m 
We now have alfinitelsequence fm on the|interval] [0, m], for each m € T. We verify that any 
fı and fm have the same values throughout the [intersection] of their two domains. Thus we 


can define a single function on the union) of the various domains. Now suppose T +N, and 
let k be the least element of N — T. That means that the definition has a problem when 
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m =k but not when m < k. We soon get a contradiction, so we deduce T = N. That means 
that the union of those domains is all of N, i.e. the function a” is defined, unambiguously, 
throughout N. 


Another inductively defined function is the Fibonacci sequence, q.v. 


We have been speaking of the inductive definition of a function, rather than just a sequence 
(a function on N), because the notions extend with little change to transfinite inductions. 
An illustration par excellence of inductive proofs and definitions is Conway’s theory of 
The numbers and their [algebraic] laws of composition are defined entirely 


by inductions which have no special starting cases. 


37.1.3 Minor variations of the method 


The reader can figure out what is meant by ”induction starting at k”, where k is not neces- 
sarily zero. Likewise, the term ” downward induction” is self-explanatory. 


A common variation of the method is proof by induction on a function of the n. 
Rather than spell it out formally, let me just give an example. Let n be a positive integer 
having no of the form 4m +3. Then n = a? + b? for some integers a and b. 
The usual textbook proof uses induction on a function of n, namely the number of prime 
factors of n. The induction starts at 1 (i.e. either n = 2 or prime n = 4m + 1), which in this 
instance is the only |part| of the proof that is not quite easy. 


37.1.4 Well-ordered sets 


An ordered set (S,<) is said to be if any nonempty subset of S has a least 
element. The criterion (1), and its proof, hold without change for any well-ordered set S in 
place of N (which is a well-ordered set). But notice that it won’t be enough to prove that 
F(n) implies F(n + 1) (where n + 1 denotes the least element > n, if it exists). The reason 
is, given an element m, there may exist elements < m but no element k such that m = k +1. 
Then the induction from n to n + 1 will fail to ”reach” m. For more on this topic, look for 


"limit ordinals)’. 


Informally, any variety of induction which works for ordered sets S in which a 
S, = {y € Sly < x} may be linfinite} is called ’transfinite induction”. 


37.1.5 Noetherian induction 


An ordered set S, or its order, is called noetherian if any non-empty subset of S has a 
maximal element) Several equivalent definitions are possible, such as the ” ascending chain condition’: 
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any strictly increasing sequence of elements of S is finite. The following result is easily proved 
by contradiction. 


Principle of Noetherian induction: Let (S, <) be a set with a Noetherian order, and let 
T be a subset of S having this property; if x € S is such that the condition y > x implies 
y E T, then xz € T. Then T= S. 


So, to prove something ” F(x)” about every element x of a Noetherian set, it is enough to 
prove that ” F(z) for all z > y” implies ”F(y)”. This time the induction is going downward, 
but of course that is only a matter of notation. The of a Noetherian order, i.e. an 


order in which any strictly decreasing sequence is finite, is also in use; it is called a partial 
well-order, or an ordered set having no infinite |antichain| 


The standard example of a Noetherian ordered set is the set of |ideals| in a 


But the notion has various other uses, in topology) as well as For a nontrivial 
example of a proof by Noetherian induction, look up the Hilbert basis theorem| 


37.1.6 Inductive ordered sets 


An ordered set (S,<) is said to be inductive if any |totally ordered subset of S has an 
upper bound) in S. Since the empty set) is totally ordered, any inductive ordered set is non- 


empty. We have this important result: 
Any inductive ordered set has a maximal element. 


Zorn’s lemma is widely used in existence proofs, rather than in proofs of a property F(x) of 
an arbitrary element x of an ordered set. Let me sketch one typical application. We claim 
that [every vector space has a basis) First, we prove that if alfreelsubset F’, of a{vector space) 
V, is a maximal] free subset (with respect to the order [relation] C), then it is albasis| Next, 
to see that the set of free subsets is inductive, it is enough to verify that the union of any 
totally ordered set of free subsets is free, because that union is an upper bound on the totally 
ordered set. Last, we apply Zorn’s lemma to conclude that V has a maximal free subset. 


Version: 10 Owner: Daume Author(s): Larry Hammick, slider142 
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Chapter 38 


O3F30 — First-order arithmetic and 
fragments 


38.1 Elementary Functional Arithmetic 


Elementary Functional Arithmetic, or EFA, is a weak of created 
by removing induction] from Because it lacks induction, defining 


exponentiation must be added. 
e Vx(x' ~ 0) (0 is the first number) 
e Vu, y(x" = y' > x = y) (the Successorifunction| is [one-to-one) 


Va(x +0 = x) (0 is the 
e Vr, y(x +y’ = (x+y) (addition is the repeated application of the successor function) 


e Vx(x-0 = 0) 


Va, y(x - (y') =x -y + zx (multiplication is repeated addition) 
( 


YVz(~(x < 0)) (0 is the smallest number) 


eVr,y(a<you<yVr=y) 
e Yr(x? = 1) 


Va(r¥ = 2 - x) 


Version: 2 Owner: Henry Author(s): Henry 
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38.2 PA 


Peano Arithmetic (PA) is the restriction of Peano’s axioms) to aļfirst order theory of arith- 
metic. The only change is that the induction axiom) is replaced by to 


(0) A Vr(¢(x) = (2’)) — Vrd(x))where ¢ is arithmetical 


Note that this replaces the single, with a countably infinite 
schema of 


Appropriate axioms defining +, -, and < are included. A full list of the axioms of PA looks 
like this (although the exact list of axioms varies somewhat from Source) to source): 
e Vx(x' #0) (0 is the first number) 
e Vx, y(x" = y' > x = y) (the successor function’ is jone-to-one) 
Va(a + 0 = x) (0 is the 
e Vx, y(x + y'= (a+ y)’) (addition is the repeated application of the successor function) 


e Vx(x-0 = 0) 
e Vr, y(x- (y') =x -y + zx) (multiplication is repeated addition) 
e VYr(~(x < 0)) (0 is the smallest number) 


eVz,y(a<you<yVr=y) 


(0) A Va(o(a2) — o(2')) — Vae(x))where ¢ is arithmetical 


Version: 7 Owner: Henry Author(s): Henry 


38.3 Peano arithmetic 


Peano’s axioms are a definition of the set of natural numbers, denoted N. From these 
axioms Peano arithmetic on natural numbers can be derived. 


1. 0€ N (0 is a natural number) 
2. For each x € N, there exists exactly one x’ € N, called the successor of x 


3. x’ £0 (0 is not the successor of any natural number) 
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4. x = y if and only if 2’ = y’. 


5. (axiom of induction) If MW C N and 0 € M and xz € M implies x’ € M, then M =N. 


The successor of x is sometimes denoted Sx instead of x’. We then have 1 = S0, 2 = S1 
SSO, and so on. 


Peano arithmetic consists of statements derived via these axioms. For instance, from these 
axioms we can define addition and multiplication on natural numbers. Addition is defined 
as 


x+1= 2 fralzeN 
(x+y) forall z,y €N 


x+y! 


Addition defined in this manner can then be proven to be both|associativel and |commutative) 


Multiplication is 


pi 
| 


x forallxreN 
x-y = «x-yt+a forallz,yeN 


This definition of multiplication can also be proven to be both associative and commutative, 
and it can also be shown to be |distributive| over addition. 


Version: 4 Owner: Henry Author(s): Henry, Logan 
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Chapter 39 


O3F35 — Second- and higher-order 
arithmetic and fragments 


39.1 ACAg 


AC Ao is a weakened form of Its axioms! include the axioms of 
together with 


Version: 1 Owner: Henry Author(s): Henry 


39.2 RC Ay 
RC Ap is a weakened form of It consists of the axioms! of other 
than induction) together with £? and A? ECA) 


Version: 1 Owner: Henry Author(s): Henry 


39.3 Zo 


Z is the full system of second order arithmetic, that is, the full [theory] of numbers and sets 
of numbers. It is sufficient for a great deal of mathematics, including much of number theory 
and analysis. 


The defining addition, multiplication, and comparison are the same as 
those of[PA] Z adds the full induction axioml and the full comprehension axiom] 
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39.4 comprehension axiom 


The axiom of comprehension (CA) |states| that every formula) defines a set. That is, 


AXVa(az € X > G(x))for any formulagwhereX does not occur free ing 


The names specification and separation are sometimes used in place of comprehension, par- 
ticularly for weakened forms of the axiom) (see below). 


In|theories| which make no distinction between objects and sets (such as ZF), this formulation 
leads to Russel’s paradox, however in stratified theories this is not a problem (for example 


lsecond_ order arithmetic! includes the axiom of comprehension). 


This axiom can be in various ways. One possibility is to restrict it to forming 
[subsets] of sets: 


VYAXVa(x E X = x € Y A G(x))for any formulagwhereX does not occur free ind 


This formulation (used in ZF [set theory) is sometimes called the Aussonderungsaxiom. 


Another way is to restrict ¢@ to some family F’, giving the axiom F-CA. For instance the 
axiom X? -CA is: 


AXVax(x € X > $(x))wheredis)}andX does not occur free ind 


A third form (usually called separation) uses two formulas, and guarantees only that those 
satisfying one are included while those satisfying the other are excluded. The unrestricted 
form is the same as unrestricted [collection] but, for instance, =? separation: 


Ve“(o(2) AY(a)) > AXVa((e(@) > x € X) A (Ye) > x ¢ X)) 


wheredandareX?and.X does not occur free ingory 


is weaker] than X} -CA. 


Version: 4 Owner: Henry Author(s): Henry 


39.5 induction axiom 


An induction axiom specifies that a [theory] includes induction| possibly lrestricted| to specific 
IND is the general axiom of induction: 


(0) AVa(o(x) —> o(a + 1)) > Vare(x) for any formula ¢ 
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If ¢ is restricted to some family of formulas F then thelaxiom|is called F-IND, or F induction. 
For example the axiom »{-IND is: 


$(0) A Vx(¢(x) — olx + 1)) — Vad(x) where ¢ is X} 


Version: 4 Owner: Henry Author(s): Henry 


312 


Chapter 40 


03G05 — Boolean algebras 


40.1 Boolean algebra 


A Boolean algebra is a set B with two A “meet,”and V “join,” and 
one [unary] operator ” “complement,” which together are a Boolean lattice. If X and Y are 


boolean algebras, a f: X — Y is a morphism of Boolean algebras when it is a 
morphism of ^, V, and’. 


Version: 6 Owner: greg Author(s): greg 


40.2 M. H. Stone’s representation theorem 


Theorem 3. Given a|Boolean algebra B there exists a\totally disconnected) Hausdorff space 
X such that B is isomorphic, to the Boolean algebra of clopen subsets of X. 


| Very rough scetch of proof] Let 
X ={f: B — {0,1}|f is a homomorphism} 


endowed with the [subspace topology|linduced) by the {product topology] on B14. Then X 


is a totally disconnected Hausdorff space. Let Cl(X) denote the Boolean algebra of clopen 
subsets of X, then the following [map] 


T: B>C(X), Tay (Fe x fle) Si} 


is well defined (i.e. T(x) is indeed a clopen set), and an [isomorphism] 
Version: 4 Owner: Dr_Absentius Author(s): Dr_Absentius 
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Chapter 41 


03G10 — Lattices and related 
structures 


41.1 Boolean lattice 


A Boolean lattice B is a{distributive lattice in which for each element x € B there exists 
a [complement] z’ € B such that 


rar = 0 
rvz =I 
(2) =x 


(zay =r vy! 
(vy =r Ay 


Given a set, any [collection] of subsets) that is [closed] under intersections, and comple- 


ments is a Boolean algebra. 


(with but allowing 0=1) are to [Boolean] lattices. To view 


a Boolean ring as a Boolean lattice, define x A y = ry and z V y = x + y + zy. To view a 
Boolean lattice as a Boolean ring, define ry = z Ay and z +y = (£ ^y) V (z Ay’). 


Version: 3 Owner: mathcam Author(s): mathcam, greg 


41.2 complete lattice 


A complete lattice is a nonempty |poset|in which every nonempty [subset] has a supremum] 
and an [infimum] 
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In particular, a complete lattice is a lattice! 


Version: 1 Owner: Evandar Author(s): Evandar 


41.3 lattice 


A lattice is any non-empty poset) P in which any two elements x and y have a 


x V y, and a greatest lower bound, x A y. 


In other words, if q = x ^y then q € P, q < xz and q <S y. Further, for all p € P if p < x and 
p < y, then p < q. 


Likewise, if q = x V y then q € P, x < q and y <S q, and for all p E€ P if x < p and y <p, 
then q < p. 


Since P is a poset, the operations A and V have the following 


A ee OA IVT? (idempotency) 
rAy=yAz, TVy=yVxz (commutativity) 
x Aly Az) = (eA ny (associativity) 


xV (yVz)=(zVy)vz 
zAlevVy)=xV(x^y)=x (absorption) 


Further, x < y is to: 


zr^y=xandgzVy=y (consistency) 


Version: 5 Owner: mps Author(s): mps, greg 
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Chapter 42 


03G99 — Miscellaneous 


42.1 Chu space 


A Chu space over a set X is a triple (A, r, X) with r: Ax X — X. A is called the carrier 
and X the cocarrier. 


Although the definition is symmetrical, in practice asymmetric uses are common. In partic- 
ular, often X is just taken to be a set of [function] from A to £, with r(a, £) = x(a) (such a 
Chu space is called normal and is abbreviated (A, X)). 


We define the perp of a Chu space € = (A,r, X) to be C+ = (X,r~,A) where r~(x,a) = 
r(a, x). 


Define f and ř to be functions defining the rows and columns of € respectively, so that 
f(a): X —> X and ř(x) : A — © are given by r(a)(x) = F(x) (a) = r (a, x). the rows 
of C are the columns of C+. 


Using these definitions, a Chu space can be represented using a/matrix) 


If ê is injective|then we call € separable and if 7 is injective we call € extensional. A Chu 
space which is both separable and extensional is biextensional. 
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42.2 Chu transform 


If € = (A,r, X) and D = (B,s, Y) are Chu spaces) then we say a pair of [functions] f : A — B 
and g : Y — X form a Chu transform from € to D if for any (a,y) E€ A x Y we have 


r(a,g(y)) = s(f (a), y). 
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42.3 biextensional collapse 


Ife = (A 


(PLAJ, r, rA 


That is, to mama the rows) of the biextensional collapse, we just use [functions] representing 
the actual rows of the original Chu space (and similarly for the |columns). The effect is to 
merge indistinguishable rows and columns. 


We say that two Chu spaces are |equivalent| if their biextensional collapses are isomorphic} 
Version: 3 Owner: Henry Author(s): Henry 


X) is a we can define the biextensional collapse of Œ to be 


J) where r'(F(a), ř(£)) = r(a, x). 


42.4 example of Chu space 


Any set A can be represented as a [Chu space) over {0,1} by (A,r, P(A)) with r(a, X) =1 
ifla € X. This Chu space batisfieslonly the trivial 24, signifying the fact that sets 
have no internal [structure], If A = {a,b,c} then the is: 


{Q {at {b} {c} {ab} {ac} {b,c} {a,b,c} 
0 0 1 1 0 1 


al 0 1 
bilo 0 1 0 1 0 1 1 
cl]oOo 0 0 1 0 1 1 1 


Increasing the structure of a Chu space, that is, adding properties, is [equivalent] to deleting 
columns! For instance we can delete the columns named {c} and {b,c} to turn this into 
the satisfying c < a. By deleting more columns, we can further increase the 
structure. For example, if we require that the set of be under the bitwise or 
operation (and delete those columns which would prevent this) then we can it will define a 
and if it is closed under both bitwise or and bitwise and then it will define a 
If the rows are also closed under complementation then we have a 


Note that these are not arbitrary ‘connections! the on each of these [classes] 
of Chu spaces correspond to the appropriate notion of for those classes. 


For instance, to see that Chu transforms arelorder preserving on Chu spaces viewed as partial 
orders, let € = (A,r, X) be a Chu space satisfying b < a. That is, for any x € X we have 
r(b, x)= 1 —> r(a,x) = 1. Then let (f,g) be a Chu transform to D = (B, s, X), and suppose 
s(f(b),y) = 1. Then r(b, g(y)) = 1 by the definition of a Chu transform, and then we have 
r(a,g(y)) = 1 and so s(f(a),y) = 1, demonstrating that f(b) < f(a). 
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42.5 property of a Chu space 


A of a [Chu space| over © with [carrier] A is some Y C £^. We say that a Chu 
space € = (A,r, X) satisfies Y if X CY. 


For example, every Chu space satisfies the property £^. 


Version: 2 Owner: Henry Author(s): Henry 
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Chapter 43 


05-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


43.1 example of pigeonhole principle 


A example. 
For any [group] of 8 there exist at least two of them whose \differencel is divisible by 
7. 


C onsider the modulo 7. These are 0,1,2,3,4,5,6. We have seven classes 
and eight integers. So it must be the case that 2 integers fall on the same residue class, and 
therefore their difference will be divisible by 7. 


Version: 1 Owner: drini Author(s): drini 
43.2 multi-index derivative of a power 


Theorem If i,k are multi-indicesiin N”, and x = (x1,..., Zn), then 


k-i)! 


aint — raat ifi<k, 
10 otherwise. 


Proof. The proof follows from the corresponding rule for the ordinary [derivative] if i, k are 
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in 0,1,2,..., then 


0 otherwise. 
Suppose i = (i1,... in), k = (ki,..., kn), and z = (z1,..., £n). Then we have that 
alil 


O'xr* 2! i kı ae 
Ox} +++ Onin 
n 
a 2. 
— g ! gt se eee 0 " n 
Ox 1 Orie "` 


For each r = 1,...,n, the x!" only depends on z,. In the above, each partial 


differentiation 0/0x, therefore reduces to the corresponding ordinary differentiation d/dz,.. 
Hence, from equation 43.2.1] it follows that O’x* vanishes if i, > k, for any r = 1,...,n. If 
this is not the case, i.e., if 7 < k as multi-indices, then for each r, 

dr i ky! 


— ri = — Mr 
driv " (krip 


and the theorem follows. 


Version: 4 Owner: matte Author(s): matte 


43.3 multi-index notation 


Definition [BB] A multi-index is an n-tuple (71, ... , in) of non-negativelintegers|i;, ... , in. 
In other words, 7 € N”. Usually, n is the{dimension| of the underlying space. Therefore, when 
dealing with multi-indices, it is assumed clear from the context. 


Operations on multi-indices 


For a multi-index i, we define the length (or order) as 
il = i te ten 
and the as 


n 


il = J [ixt 


k=1 
If i = (i1,..., in) and j = (ji,..., jJn) are two multi-indices, their sum and is 
defined component-wise as 

t+] = (ii + ji,- -in + jn), 

t= = Gi “Five tie Jn): 
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Thus |i + j| = i| + |j|. Also, if j < ip for all k = 1,...,n, then we write j < i. For 
multi-indices i, j, with j < i, we define 


(C) 7 i! 
j (i =3)g] 
For a point 2 = (£1,..., £n) in R” (with standard |coordinates) we define 
k=1 
Also, if f : R” — R is alsmooth function] and i = (i1, ..., in) is a multi-index, we define 
glil 
a yA Í, 
0 Lej eee 0 TEn 


where e1, .. . , €n are the standardlunit vectorslof R”. Since f is sufficiently [smooth] the order 
in which the derivations) are performed is irrelevant. For multi-indices i and j, we thus have 


dai = att) = gH = Po. 


əf = 


Much of the motivation for the above notation is that standard results such as Leibniz’ rule, 
Taylor’s etc can be written more or less as-is in many dimensions by replacing 
indices in N with multi-indices. Below are some examples of this. 


Examples 


1. If n is a positive integer, and x1,..., £; are|complex numbers), the multinomial) expan- 
sion states) that 


yi 
(mta nA 


li=n 


where x = (£1,..., zg) and i is a multi-index. (proof) 


2. Leibniz’ rule [[]: If f, g : R” — R are smooth functions, and j is a multi-index, then 


a(t) =E (I)a, 


i<j 


where 7 is a multi-index. 


REFERENCES 


1. http://www.math.umn.edu/ jodeit/course/TmprDist1.pd 
2. M. Reed, B. Simon, Methods of Mathematical Physics, I - Functional Analysis, Aca- 
demic Press, 1980. 


3. E. Weisstein, Eric W. Weisstein’s world of mathematics, entry on Multi-Index Notation 
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Chapter 44 


05A10 — Factorials, binomial 
coefficients, combinatorial functions 


44.1 Catalan numbers 


The Catalan numbers, or Catalan sequence, have many interesting applications in com- 
binatorics. 


The nth Catalan number is given by: 


where (”) represents] the The first several Catalan numbers are 1, 1, 2, 
5, 14, 42, 132, 429, 1430, 4862 ,... (see EIS sequence; A000108) for more {terms). The Catalan 
numbers are also generated by the [recurrence relation| 


For example, C3 = 1-2+1-14+2-1=5,Cyg=1-54+1-24+2-145-1=14, etc. 
The ordinary for the Catalan numbers is 

ny on l= VI - 42 

yo 

n=0 


2z 


Interpretations of the nth Catalan number include: 
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1. The number of ways to arrange n pairs of parentheses, e.g.: 


() 


2. The number of ways an polygon of n + 2 sides can be split into n 
3. The number of rooted [binary] trees with exactly n + 1 leaves. 


The Catalan sequence is named for Eugéne Charles Catalan, but it was discovered in 1751 
by Euler when he was trying to solve the problem of subdividing polygons into triangles. 


REFERENCES 


1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. Addison- 
Wesley, 1998. 
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44.2 Levi-Civita permutation symbol 


Definition. Let k; € {1,---,n} for all i = 1,---,n. The Levi-Civita permutation 
symbols €y,,...z,, and e*!’"*» are defined as 


+1 when {1+ kı} is an even permutation (of {1,--- ,n}), 
—1 when {I+ kı} is an odd permutation, 
0 otherwise, i.e., when k; = kj, for some i Æ j. 


Eky--km =e j E 


The Levi-Civita permutation symbol is a special case of the 
Using this fact one can write the Levi-Civita permutation symbol as the determinant! of an 

n x n{matrix| consisting of traditional delta symbols. See the entry on the generalized Kro- 
necker symbol for details. 


When using the Levi-Civita permutation symbol and the generalized Kronecker delta symbol, 
the Einstein summation convention is usually employed. In the below, we shall also use this 
convention. 
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e When n = 2, we have for all i, j, m,n in {1,2}, 


age" = dor — dr", (44.2.1) 
eye” = OF (44.2.2) 
Eyt = 2 (44.2.3) 


e When n = 3, we have for all i, j,k, m,n in {1,2,3}, 


T = 20, (44.2.4) 
Ege = 6. (44.2.5) 


Let us prove these properties. The proofs are instructional since they demonstrate typical 
argumentation methods for manipulating the symbols. 


Proof. For equation220.5.I] let us first note that both sides are with respect 


of ij and mn. We therefore only need to consider the case i 4 j and m Æ n. By substitution, 
we see that the equation holds for €,,¢", i.e., for i = m = 1 and j = n = 2. (Both sides are 


then one). Since the equation is in 77 and mn, any set of values for these 


can be reduced the the above case (which holds). The equation thus holds for all values of 
ij and mn. Using equation 220.5.1] we have for equation 44.2.2] 


= 207 — ò; 
= 07. 


Here we used the Einstein summation convention with 7 going from 1 to 2. Equation 44.2.3] 
follows similarly from equation 44.2.2] To establish equation £4.2.4] let us first observe that 
both sides vanish when i Æ j. Indeed, if i # j, then one can not choose m and n such 
that both permutation symbols on the left are nonzero. Then, with i = j [fixed| there are 
only two ways to choose m and n from the remaining two indices. For any such indices, 
we have €jmne""" = (e)? = 1 (no summation), and the result follows. The last property 
follows since 3! = 6 and for any distinct indices i, j,k in {1,2,3}, we have ¢;;,¢%*" = 1 (no 
summation). O 


Examples. 


e The determinant of an n x n matrix A = (aij) can be written as 
det Å = Se ui glia ¢* Anin» 


where each 2; should be summed over 1,..., 7. 


e If A= (A!', A’, A?) and B = (Bt, B?, B®) are |vectors| in R? (represented in some right 
hand oriented {orthonormal basis), then the ith/component) of their cross product) equals 


(A x BÝ = c AI BF. 
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For instance, the first component of A x B is A? B? — A? B?. From the above expression 
for the cross product, it is clear that A x B = —B x A. Further, if C = (Ct, C?, C°) 


is a vector like A and B, then the triple scalar product, equals 
A- (B x C) = e A BIO". 


From this expression, it can be seen that the triple scalar product is antisymmetric 
when exchanging any [adjacent] arguments. For example, A- (B x C) = —B-(A x C). 


e Suppose F = (Ft, F>, F?) is alvector fieldldefined on someldomainlof R? with Cartesian 
[coordinates x = (x!, a”, x°). Then the ith component of the curl] of F equals 


(V x F(x) = ct F(a), 


Version: 7 Owner: matte Author(s): matte 


44.3 Pascal’s rule (bit string proof) 


This proof is based on an alternate, but definition of the binomial coefficient fa 
is the number of bit (finite|sequences) of Os and 1s) of length n with exactly r ones. 


We want to show that 
(") (; 7 i) (’ i ') 
= + 
r r— i1 r 


To do so, we will show that both sides of the equation are counting the same set of bit 
strings. 


The left-hand side counts the set of strings of n bits with r 1s. Suppose we take one of these 
strings and remove the first bit b. There are two cases: either b = 1, or b = 0. 


If b = 1, then the new string is n — 1 bits with r — 1 ones; there are pa bit strings of this 
nature. 


If b = 0, then the new string is n — 1 bits with r ones, and there are (" 


nature. 


z] strings of this 


Therefore every string counted on the left is covered by one, but not both, of these two cases. 
If we add the two cases, we find that 


= + 
r r— i1 r 
Version: 2 Owner: vampyr Author(s): vampyr 
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44.4 Pascal’s rule proof 


(:) Maes) 7 “ 


Let us begin by writing the left-hand side as 


We need to show 


n! n! 
k(n —bk)! | (R-Dn—(k—1))! 


Getting a common [denominator] and simplifying, we have 


n! n! B (n—k+1)n! kn! 
k\(n — k)! * (kK-D!in—k+1)! — (n=k+1)k!(n-k)! k(k-1)(n-k+1)! 
(n—k+1)n!+ kn! 
ki(n—k+1)! 
(n+ 1)n! 
k!((n + 1) —k)! 
(n+ 1)! 
k!((n + 1) — k)! 


C5) 
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44.5 Pascal’s triangle 


Pascal’s triangle is the following configuration of numbers: 


1 
1 1 
1 2 1 
1 3 3 1 
1 4 6 4 1 
1 5 10 10 5 1 
1 6 15 20 15 6 1 
1 T 21 35 35 21 7 1 
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This goes on into infinity! Therefore we have only printed the first 8 lines. In general, 
this triangle is constructed such that entries on the left side and right side are 1, and every 


entry inside the triangle is obtained by summing the two entries immediately above it. For 
instance, on the forth row] 4 = 1 + 3. 


Historically, the application of this triangle has been to give the coefficients when expanding 
binomial expressions. For instance, to expand (a + b)*, one simply look up the coefficients 
on the fourth row, and write 


(a+ )* = af + 4a°b + 6076" + 4ab? + O°. 


Pascal’s triangle is named after the French mathematician Blaise Pascal (1623-1662) BI. 
However, this triangle was known at least around 1100 AD in China; five centuries before Pas- 
cal [I]. In modern |language] the expansion of the binomial is given by the binomial theorem] 
discovered by Isaac Newton in 1665 Ø]: For any n = 1,2,... and real numbers | a, b, we have 


(a+b) = D y 


k=0 


n —1 n —212 n 
n n b n steve ple. 
a + (Ue G) b“ + 


Thus, in Pascal’s triangle, the entries on the nth row are given by the [binomial coefficients] 


(=e 


for k = 1,...,n. 


REFERENCES 


1. Wikipedia’s entry on the binomial coefficients 
2. Wikipedia’s entry on Isaac Newton 
3. Wikipedia’s entry on Blaise Pasca 
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44.6 Upper and lower bounds to binomial coefficient 


aN 
zs z3 sr 3 
Sa” Sa” 
IA IA 
=| 5 
3 =n > 
=] 
D 
i 
Eo 


ZD TT 
x 3 
NS 
IV 
a. 
xI Ss 
nN 
> 


Also, for large n: (7) =% a 
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44.7 binomial coefficient 


The number of ways to choose r objects from a set with n elements (n > r) is given by 


n! 
(n—r)lrl 


It is usually denoted in several ways, like 


("). C(n,r), © 


These numbers are called binomial coefficients, because they show up at expanding (£ +y)”. 


Some interesting 


e (") is the coefficient of zty” in (x + y)”. (binomial theorem). 
ow) 

o (.",)+(") = ("*) (Pascals rule). 

e (*) =1= (£) for all n. 

A a a) = 

oC) Ope G) =o 

© Era Ge) = (Eh) 
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On the context of Computer Science, it also helps to see (") as the number of 
consisting of ones and zeros with r ones and n — r zeros. This equivalency comes from the 
fact that if S bea set with n elements, (”) is the number of distinct subsets] of 5 with 
r elements. For each subset T of S, consider the [function] 


Xr: S — {0,1} 


where Xr(x) = 1 whenever x € T and 0 otherwise (so Xr is the [characteristic function] for 
T). For each T € P(S), Xr can be used to produce a unique bit string of length n with 
exactly r ones. 
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44.8 double factorial 


The double factorial of a positive [integer]n is 
nl! = n(n — 2) --- kn 
where k,, denotes 1 if n is odd and 2 if n is even. 


For example, 
7!!=7-5-3-1= 105 
10!! = 10 -8 -6 - 4 - 2 = 3840 


Note that n!! is not the same as (n!)!. 
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44.9 factorial 


For any non-negative linteger|n, the factorial of n, denoted n!, can be defined by 
n! = iG 
rl 
where for n = 0 the empty product is taken to be 1. 


Alternatively, the factorial can be defined recursively by 0! = 1 and n! = n(n—1)! for n > 0. 


n! is equal to the number of of n distinct objects. For example, there are 5! 
ways to arrange the five letters A, B, C, D and E into a word. 
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Euler’s gamma function|T (x) generalizes the notion of factorial to values, 


as 
Tin+1)=n! 


for every non-negative integer n. 
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44.10 falling factorial 


For n € N, the rising and falling factorials are n™ described, respectively, 
by 


The rising factorial is often written as (x), and referred to as the Pochhammer symbol (see 
hypergeometric |series). Unfortunately, the falling factorial is also often denoted by (x),,, so 
great care must be taken when encountering this notation. 


Notes. 


Unfortunately, the notational conventions for the rising and falling factorials lack a common 
standard, and are plagued with a fundamental inconsistency. An examination of reference 
works and textbooks reveals two fundamental Sources) of notation: works in combinatorics 
and works dealing with hypergeometric 


Works of combinatorics [1,2,3] give greater [focus to the falling factorial because of if its role 
in defining the Stirling numbers, The symbol (x), almost always denotes the falling factorial. 


The notation for the rising factorial varies widely; we find (x),, in [1] and (x) in [3]. 


Works focusing on special functions [4,5] universally use (x), to denote the rising factorial and 
use this symbol in the description of the various flavours of hypergeometric series. Watson [5] 
credits this notation to Pochhammer [6], and indeed the special functions literature eschews 
“falling factorial” in favour of “Pochhammer symbol”. Curiously, according to Knuth [7], 
Pochhammer himself used (x), to denote the binomial coefficient] (Note: I haven’t verified 
this.) 


The notation featured in this entry is due to D. Knuth [7,8]. Given the fundamental in- 
consistency in the existing notations, it seems sensible to break with both traditions, and 
to adopt new and graphically suggestive notation for these two concepts. The traditional 
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notation, especially in the hypergeometric camp, is so deeply entrenched that, realistically, 
one needs to be familiar with the traditional modes and to take care when encountering the 
symbol (2),. 


References 


1. Comtet, Advanced combinatorics. 

2. Jordan, Calculus of finite differences. 

3. Riordan, Introduction to combinatorial analysis. 

4. Erdélyi, et. al., Bateman manuscript project. 

5. Watson, A treatise on the theory of Bessel functions. 
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6. Pochhammer, “Ueber hypergeometrische Functionen n** Ordnung,” Journal für die 


reine und angewandte Mathematik 71 (1870), 316-352. 
7. Knuth, “Two notes on notation” 


8. Greene, Knuth, Mathematics for the analysis of algorithms. 
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44.11 inductive proof of binomial theorem 


When n = 1, 


1 
k=0 


1 1 
(a+b)! = ` (;,) gS e a'b? + e ab! =a +b. 


For the inductive step, assume it holds for m. Then for n = m + 1, 
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(a+b)! = ala +b)” +bla +b)” 


= gms >, & Ja m=k+1pk 2 y ) a™ibit! by pulling out the k = 0 term 
m a 

= gmtty y (e m—k+ pk y 
k=1 


Mi 1) a™—-*+14F by letting j = k — 1 


k=1 

= gma +Y & ja m—k+1 pk > Mi 1) qittl—kpk 4 5"! by pulling out thek = m + 1 te 
k=1 k=1 

= art prti y ps u )+ (," )| ed by combining the sums 


m 


1 
= art pprt y >> — + ) a™*1-kok from Pascal’s rule 
k=1 


m+1 m + 1 
= 5 ( i m by adding in the m + 1 terms, 


as desired. 
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44.12 multinomial theorem 


A multinomial is a mathematical expression consisting of two or more terms, e.g. 
A,X, + Ag%o +... + AkTk. 


The multinomial theorem provides the general form of the expansion of the powers] of this 
expression, in the process specifying the multinomial coefficients which are found in that 
expansion. The expansion is: 


n! n 
(z1 + z2 +... +k) = ` PP EEE i 22" “aoe,” (44.12.1) 
rno 


Tong! 


where the sum is taken over all multi-indices] (n1, . .. np) € N* that sum to n. 
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. n! . . . . . . . 
The expression TES] occurring in the expansion is called multinomial coefficient and 


is denoted by 
n 
a = a 
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44.13 multinomial theorem (proof) 


Proof. The below proof of the multinomial theorem|uses the binomial theoremland induction| 
on k. In addition, we shall use 


First, for k = 1, both sides equal x}. For the induction step, suppose the multinomial 
theorem holds for k. Then the binomial theorem and the induction assumption yield 


n n 7 
(tite H Ek + Tey)" = Dò tetaan 


II 
Me 
~ 3 
SN 

T 
=| 8 

8 
x3 
+ | 
Fi 


where x = (x1,..., £p) and i is a multi-index| in I To the proof, we need to show 
that the sets 


A = {i ipn) EI L= 0.. n, [i.i] = l}, 
B = {j€ H ||j] =n} 


are equal. The [inclusion] A C B is clear since 


(i, ipn- Dl =l+n-l=n. 


For B C A, suppose j = (j1,---, rai) € IE, and |j| = n. Let l = |(ji,---, jy). Then 
l= n — jeri, SO Jjk}1 = N — l for some l= 0,...,n. It follows that that A = B. 


Let us define y = (£1, +- , £41) and let j = (j1,---, jk41) be a multi-index in 7#+*. Then 
(Gt yeh) pIR+1 
T T 

TEE i =n! — 

ljl=n (Jus -+ - Sk) esa! 
yi 
= n! I 
n 

Jn 


This completes the proof. 0 
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44.14 proof of upper and lower bounds to binomial co- 
efficient 


Let 2 < k < n be{natural numbers! We’ll first prove the 


We rewrite (2) as 


to get 


Multiplying the inequality above with aa < e*-! yields 


n(n—1)---(n— k+ 1)k* n 
kk - kl kl 


To conclude the proof we show that 


n—-1 i 
1 m 

(+5) =—vn>2€N. (4.14.1) 
l n: 


i=1 


0 
TN 
= 
+ 
>. | = 
A 
|l 
Si 
= 
x| + 
D 


Since each left-hand [factorlin @IZ1) is < e, we have ne <e°  Sincen—i<nVvl<as< 
k — 1, we immediately get 


ao HO-) p 
TAn 


k! k! 


And from 


k<n & (n-i)-k>(k—-i)-nV1<i<k-1 


we obtain 
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Chapter 45 


05A15 — Exact enumeration problems, 
generating functions 


45.1 Stirling numbers of the first kind 


Introduction. The Stirling numbers of the first kind, frequently denoted as 
s(n, k), k,n €N, 1<k<n, 


are the coefficients of the falling factorial polynomials, To be more precise, the 
defining relation) for the Stirling numbers of the first kind is: 


n 


x = x(x — 1)(x — 2)... (£= n + 1) =) s(n, kja". 


k=1 


Here is the table of some initial values. 


nk 1 2 3 45 


1 1 

2 -1 1 

3 2 -3 1 

4 -6 11 -6 1 

5 24 -50 35 -10 1 


Recurrence Relation. The evident observation that 


ith = gg® — nz. 


leads to the following [equivalent] characterization of the s(n, k), in terms] of a 2-place recur- 


rence [formula] 
s(n + 1, k) =s(n,k — 1) —ns(n,k), 1<k<n, 
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subject to the following initial conditions: 


a(n, 0) =0, s(1,1)= 1. 


Generating Function. There is also astrong{connection| with the generalized binomial formulal 
which furnishes us with the following generating function 


(+t)? SS (n, k)x 


n=0 k=1 


This generating function implies a number of Taking the [derivativelof both sides 
with respect to t and equating powers, leads to the described above. 


Taking the derivative of both sides with respect to x gives 


(k+ 1)s(n, k +1) = XO (1) (n — j)! (" i ‘) s(j, k) 


j=k 
This is because the derivative of the left side of the generating funcion equation with respect 


to x is 
CO 


- 
(1+¢)"In(1 +t) = (1+ t)” X (-1) *— 
k=1 


The relation 
(LAHOA (1am 


yields the following family of summation identities. For any given kı, k2, d > 1 we have 


% + kə 


d+kı+k 
5 J(+ ah ha) = E ( : *) (a + fash) + Basko): 
d 


dı +d2= kı T dı 


Enumerative interpretation. Thelabsolute valuelof the Stirling number of the first kind, 
s(n, k), counts the number of of n objects with exactly k (equiva- 
lently, with exactly k icycles). For example, s(4,2) = 11, corresponds to the fact that 


the symmetric group on 4 objects has 3 permutations of the form 


(xx)(xx) — 2 orbits of size 2 each, 
and 8 permutations of the form 
(xxx) — 1 orbit of size 3, and 1 orbit of size 1, 


(see the entry on for the meaning of the above expressions. ) 


Let us prove this. First, we can remark that the unsigned Stirling numbers of the first are 
characterized by the following recurrence relation: 


[s(n +1, k)| = |s(n, k — 1)| + n|s(n, k)|, 1<S<k<n. 
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To see why the above recurrence relation matches the count of permutations with k cycles, 
consider forming a permutation of n + 1 objects from a permutation of n objects by adding 
a distinguished object. There are exactly two ways in which this can be accomplished. We 
could do this by forming a\singleton| cycle, i.e. leaving the extra object alone. This accounts 
for the s(n, k — 1) term in the recurrence formula. We could also insert the new object into 
one of the existing cycles. Consider an arbitrary permutation of n object with k cycles, and 
label the objects aj,...,@,, so that the permutation is represented by 


(ay paw Oy, | a ore aj) ue Cees ee An) . 
N, oa 
k cycles 


To form a new permutation of n+ 1 objects and k cycles one must insert the new object into 
this array. There are, evidently|n ways to perform this insertion. This explains the n s(n, k) 
term of the recurrence relation. Q.E.D. 
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45.2 Stirling numbers of the second kind 


Summary. The Stirling numbers of the second kind, 
S(n,k), kneN, 1<k<n, 


are a doubly indexed [Sequence] of enjoying a wealth of interesting combina- 


torial There exist several logically equivalent) characterizations, but the starting 
point of the present entry will be the following definition: 


The Stirling number S'(n,k) is the number of way to a set of n objects 
into k 


For example, S(4, 2) = 7 because there are seven ways to partition 4 objects — call them a, 
b, c, d — into two groups, namely: 


(a)(bed), (b)(acd), (c)(abd), (d)(abe), (ab)(cd), (ac) (bd), (ad) (be) 
Four additional characterizations will be discussed in this entry: 


ea 
e algenerating function] related to the [falling factorial] 
e differential operators 


e a double-index generating function 


Each of these will be discussed below, and shown to be equivalent. 
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A recurrence relation. The Stirling numbers of the second kind can be characterized in 
lterms) of the following recurrence relation: 


S(n,k) = kS(n — 1,k) + S(n-1,k-1), 1<k<n, 
subject to the following initial conditions: 
S(n,n) = S(n,1)= 1. 


Let us now show that the recurrence[formulalfollows from the enumerative definition. 
there is only one way) to partition n objects into 1 group (everything is in that group), and 
only one way to partition n objects into n groups (every object is a group all by itself). 
Proceeding recursively, a division of n objects ay,...,@n—1,@n into k groups can be achieved 
by only one of two basic maneuvers: 


e We could partition the first n — 1 objects into k groups, and then add object a, into 
one of those groups. There are kS(n — 1, k) ways to do this. 


e We could partition the first n — 1 objects into k — 1 groups and then add object a, as 
a new, 1 element group. This gives an additional S(n — 1,k — 1) ways to create the 
desired partition. 


The [recursive] point of view, therefore explains the between the recurrence for- 
mula, and the original definition. 


Using the recurrence formula we can easily obtain a table of the initial Stirling numbers: 


nik 1 2 3 4 5 
1 1 

2 11 

3 13 1 
4 17 6 1 
5 1 15 25 10 1 


Falling Factorials. Consider the of polynomials) in indeterminate x. The 
most obvious of this vector space is the sequence of 


powers} x”, n € N. However, the sequence of falling factorials: 
xr” = g(x — 1)(x — 2)...(x-n +1) neEN 


is also a basis, and hence can be used to generate the monomial basis. Indeed, the Stirling 
numbers of the second kind can be characterized as the the coefficients involved in the 
corresponding change of basis [matrix] i.e. 


g= ` S(n, kak. 
k=1 
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So, for example, 


x’ =x + Ta(x — 1) + 6a(2 — 1)(x — 2) + w(x — 1) (x — 2)(x — 3). 


Arguing inductively, let us prove that this characterization follows from the recurrence rela- 
tion. Evidently the formula is true for n = 1. Suppose then that the formula is true for a 
given n. We have 

rrë = rH + krë, 


and hence using the recurrence relation we deduce that 


p ` S(n, k) £ zë 
k=1 
XO (kS(n, k)a® + S(n,k + 1)) z= 


k=1 
n+1 


= bD S(n, k)xë 


Differential operators. Let D, denote the ordinary |derivative| applied to polynomials 
in indeterminate x, and let Ty denote the differential operator xDy. We have the following 
characterization of the Stirling numbers of the second kind in terms of these two operators} 


(T.)" = J S(n,k) 2 (D), 


where an exponentiated differential operator denotes the operator composed with itself the 
indicated number of times. Let us show that this follows from the recurrence relation. The 
proof is once again, inductive. Suppose that the characterization is true for a given n. We 


have 
Tie") = ka*(D,)" HIDIS, 


and hence using the recurrence relation we deduce that 


(T = sD, (>: S(n,k) apat) 
= Soak) r D) Er D 


n+1 


=>) Saba) 
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Double index generating function. One can also characterize the Stirling numbers of 
the second kind in terms of the following generating function: 


e”) — D 5 S(n, k) x" a 


n=1 k=1 


Let us now prove this. Note that the 


dg 

dt E, 
admits the general solution 

E= er 


It follows that for any polynomial p(£) we have 
—t 
expt PO] =o GE)" BO, _, = Plea) 


The proof is [simple] just take D; of both sides. To be more explicit, 


D, [p(e'2)] = p'(e'a)ete = TPE) 
and that is exactly equal to D; of the left-hand side. Since this |relation| holds for all polyno- 


mials, it also holds for all formal power series! In particular if we apply the above relation 
to e£, use the result of the preceding [section] and note that 


De [ef] = e$, 
we obtain 
xet = t” np n€ 
= Tel] _ 
n=1 =e 
-EF s(n, k) Eere 
n=1 k=1 = 
OO n tn 
= ` S(n,k) x a 
n=1 k=1 


Dividing both sides by e” we obtain the desired generating function. Q.E.D. 
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Chapter 46 


05A19 — Combinatorial identities 


46.1 Pascal’s rule 
Pascal’s rule is the binomial [identity] 
n n n+1 
t= Cr) 
where 1 < k <n and (i) is the binomial coefficient) 
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Chapter 47 


05A99 — Miscellaneous 


47.1 principle of inclusion-exclusion 


The principle of inclusion-exclusion provides a way of methodically counting the [union] 
of possibly non-disjoint sets. 


Let C = {Aj, Ao,... An} bea of finite sets. Let J, {represent) the set of k-fold 
lintersections) of members of C (e.g., Io [contains] all possible intersections of two sets chosen 
from C). 


Then 
N 


ol) ee ` (—1) 9+) ` IS] 


j=0 SEI; 


For example: 


ALJ BI = (Al + 1B) - VAP) B)) 
AU BUcl = (lAl + |B| + Ic) - (ANBI +ANCI +B AC + dat ef) en 
The principle of inclusion-exclusion, combined with de Morgan’s theorem, can be used to 


count thelintersection of setslas well. Let A be some|universall set such that A; C A for each 
k, and let A; represent the complement) of A, with respect to A. Then we have 


thereby turning the problem of finding an intersection into the problem of finding a union. 
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47.2 principle of inclusion-exclusion proof 
The proof is by induction) Consider a single set A,. Then the|principle of inclusion-exclusion 
lstates| that |A,| = |A|, which is trivially true. 
Now consider a [collection] of exactly two sets A, and Aj. We know that 
AU B=(A\B)U(2\ 4) U41)2) 


Furthermore, the three sets on the right-hand side of that equation must be [disjoint] There- 
fore, by the addition principle, we have 


AUB) = |A\BI+|B\A +AA BI 
= |A\B/+ 4) 8] +1B\ 4|+ 14) 81-142! 
= _|A|+ |B|- |Af BI 


So the principle of inclusion-exclusion holds for any two sets. 


Now consider a collection of N > 2ffinitelsets 41, A2, ... Ay. We assume that the principle 
of inclusion-exclusion holds for any collection of M sets where 1 < M < N. Because the 
of sets is we may break up the union of all sets in the collection into a 


union of two sets: 
N N-1 
Ua=(Us)Uas 
i=l i=1 


By the principle of inclusion-exclusion for two sets, we have 


N 


UA 


i=l 


Ail + |An| — 


Now, let J, be the collection of all k-fold intersections] of A, A2,...An_1, and let I, be 
the collection of all k-fold intersections of A;, Ag,...Ay that include Ay. Note that Ay is 
included in every member of Jj, and in no member of Ig, so the two sets do not duplicate 
one another. 


We then have 


UA 


i=1 


Us) 


i=l 


N 
=N —1)79) X` |s] fA 


Sel; 
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by the principle of inclusion-exclusion for a collection of N — 1 sets. Then, we may distribute 
set intersection over set union to find that 


N N N-1 
LJ Ail = 55 (CDe S Ist] + lAs- IU Af An) 
i=1 j=1 SElj i=1 


Note, however, that 


(Ac() An) UA N An) = (4 (Au Ax) 


Henc we may again apply the principle of inclusion-exclusion for N — 1 sets, revealing that 


N N-1 N- 
Ui) = yo) X Js] | + Awl - > ( OD STIS An 
i=l j=l Sel; Sel; 
N-1 N- 
= P [CD E s| | + Awl > ( a 
j=l Sel; Sell, 
N-1 
= “DAUN 9) + Anh = | 
j=l Sel; Sel! 


= (eye Sst) + Axl + 


j=l SEL 


1)9+9) ` IS] 


1 
SEI 


N 
j=2 
N-1 N 
j=2 


The second sum does not include J}. Note, however, that T; = {Ay}, so we have 


N N-1 
Jail = jor (| rl ery Ss" (S| 
i=l j=l Sel; Sel’ 
N-1 
= (Dre SS IS 
j=l Sel; Sel) 


Combining the two sums yields the principle of inclusion-exclusion for N sets. 
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Chapter 48 


05B15 — Orthogonal arrays, Latin 
squares, Room squares 


48.1 example of Latin squares 


It is easily shown that the multiplication table (Cayley-table) of a{groupj has exactly these 
and thus are|latin squares| The converse, however, is (unfortunately) not true, ie. 
not all Latin squares are multiplication tables for a group (the smallest counter example is 


a Latin square of order] 5). 
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48.2  graeco-latin squares 

Let A = (a;;) and B = (b;;) be two n x n{matrices| We define their join) as the matrix whose 
(i, j)th entry is the pair (aij, bij). 

A graeco-latin square is then the join of two 


The comes from Euler’s use of greek and latin letters to differentiate the entries on 
each array. 


An example of graeco-latin square: 


aa bB cy do 
dy cd ba aß 
bd ay dp ca 
cB da ad by 
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48.3 latin square 


A latin square of lorder|n is an n x n array such that each and each Tow are made 


with the same n symbols, using every one exactly once time. 


Examples. 
abcd 1234 
c da b 4 3 2 1 
dcba 2 143 
bad 3412 
Version: 1 Owner: drini Author(s): drini 
48.4 magic square 
A magic square oflorder|n is an n xn array using each one of the numbers 1, 2,3,...,n? once 


and such that the sum of the numbers in each row, column] or main diagonal is the same. 


Example: 


A GW OO 
oO ole 
NNO 


It’s easy to prove that the sum is always in(n? +1). So in the example with n = 3 the sum 
is always $(3 x 10) = 15. 
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Chapter 49 


05B35 — Matroids, geometric lattices 


49.1 matroid 


A matroid, or an independence structure, is a kind of finite] mathematical [structure] whose 
imitate the properties of a finite [subset] of a {vector space) Notions such as 
and |independence| (of a subset) have a meaning for any matroid, as does the notion of duality. 


A matroid permits several equivalent formal definitions: two definitions in terms of a rank 
one in terms of independant subsets, and several more. 


For a finite set X, B(X) will denote the set of all subsets of X, and |X| will denote the 
number of elements of X. F is a fixed finite set throughout. 


Definition 1: A matroid is a pair (E,r) where r isa B(E) — N satisfying these 


r1) r(S) < |S| for all S C E. 
r2) fS CTCE then r(S) <r(T). 
r3) For any subsets S and T of E, 


r(S(J]T)+r(S(]T) < r(S)+r(T). 


The matroid (E, r) is called hormallif also 
r*) r({e}) = 1 for any e € E. 
r is called the rank function of the matroid. (r3) is called the submodular linequality| 


The notion of isomorphism) between one matroid (E,r) and another (F's) has the expected 
meaning: there exists a f : E — F which preserves rank, i.e. satisfies s(f(A)) = 
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r(A) for all AC E. 


Definition 2: A matroid is a pair (E,r) where r is a mapping (E) — N satisfying these 
axioms: 


ql) r(Q) = 0. 

q2) If z € E and Sc E then r(SU{x}) — r(S) € {0,1}. 

q3) If x,y € E and S C E and r(SUHr}) = r(SU{y}) =r(S) then r(S U{z, y}) = r(S). 
Definition 3: A matroid is a pair (Æ, T) where J is a subset of G(F) satisfying these axioms: 
il) eT. 

i2)IfSCTCHandT el then SeT. 


i3) If S,T € I and S,T C U C E and S and T are both [maximal] subsets of U with the 
property that they are in J, then |S| = |T]. 


An element of J is called an [independent] set. (Æ, T) is called normal if any |singleton) subset 


of E is independant, i.e. 
i*) {z} € I foralxe E 


Definition 4: A matroid is a pair (E, B) where B is a subset of 3(E£) satisfying these 
axioms: 


bl) B+ 0. 

b2) If S,T € Band S CT then S =T. 

b3) If S,T € B and z € E — S then there exists y € E — T such that (S |]J x) — y € B. 
An element of B is called a [basis] (of Æ). (E, B) is called normal if also 

b*) Uren 6 = E 

i.e. if any singleton subset of EF can be extended to a basis. 


Definition 5: A matroid is a pair (E, ) where ¢ is a mapping ((F) — G(£) satisfying 
these axioms: 


#1) S C @(S) for all S C E. 
$2) If S C ọ(T) then 6(S) c 4(T). 
3) If x € P(S U{y}) — (S) then y € (S U{z}). 


@ is called the span mapping of the matroid, and ¢(A) is called the span of the subset A. 
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(E, ¢) is called normal if also 


$=) (0) =0 


Definition 6: A matroid is a pair (E, C) where C is a subset of G(F) satisfying these 
axioms: 


cl) Ø EC. 
c2) If ST € C and S CT then S=T. 


c3) If ST € C and S # T and x € S()T then there exists U € C such that x ¢ U and 
Uc SUT. 


An element of C is called a circuit. (E, C) is called normal if also 


c*) No singleton subset of F is a circuit. 


49.1.1 Equivalence of the definitions 


It would take several pages to spell out what is a circuit in terms of rank, and likewise for 
each other possible pair of the alternative defining notions, and then to prove that the various 
sets of axioms unambiguously define the same structure. So let me sketch just one example: 
the equivalence of Definitions 1 (on rank) and 6 (on circuits). Assume first the conditions in 
Definition 1. Define a circuit as a minimal subset A of 3(£) having the property r(A) < |A]. 
With a little effort, we verify the axioms (cl)-(c3). Now assume (cl)-(c3), and let r(A) be 
the largest integer|n such that A has a subset B for which 


- B no element of C 
-n = |B|. 


One now proves (r1)-(r3). Next, one shows that if we define C in terms of r, and then 
another rank function s in terms of C, we end up with s=r. The equivalence of (r*) and 
(c*) is easy enough as well. 


49.1.2 Examples of matroids 


Let V be a vector space over alfield|k, and let E be a finite subset of V. For S C E, let r(S) 
be the [dimension] of the [subspace] of V generated by S. Then (F,r) is a matroid. Such a 
matroid, or one isomorphic to it, is said to be representable over k. The matroid is normal 
iff 0 ¢ E. There exist matroids which are not representable over any field. 


The second example of a matroid comes from graph theory, The following definition will be 
rather informal, partly because the terminology of graph theory is not very well standardised. 
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For our present purpose, a\graph| consists of a finite set V, whose elements are called {vertices| 
plus a set Æ of two-element subsets of V, called edges} A circuit in the graph is a finite set 
of at least three edges which can be arranged in a cycle} 


{a,b}, {b,c}, net Oe 4 Sey 


such that the vertices a, b,...z are distinct. With circuits thus defined, E satisfies the axioms 
in Definition 6, and is thus a matroid, and in fact a normal matroid. (The definition is easily 
adjusted to permit graphs with loops, which define non-normal matroids.) Such a matroid, 
or one isomorphic to it, is called “graphic”. 


Let E = AUB be a finite set, where A and B are nonempty and disjoint, Let G a subset of 
Ax B. We get a “matching” matroid on E as follows. Each element of Æ defines a “line” 
which is a subset (a{rowjor column) of the set A x B. Let us call the elements of G “points”. 
For any S C E let r(S) be the largest number n such that for some set of points P: 


-= |P|=n 
— No two points of P are on the same line 
— Any point of P is on a line defined by an element of S. 


One can prove (it is not trivial) that r is the rank function of a matroid on EF. That 
matroid is normal iff every line contains at least one point. [matching matroids participate in 


combinatorics, in connection with results on “transversals” , such as Hall’s marriage theorem 


49.1.3 The dual of a matroid 


Proposition: Let Æ be a matroid and r its rank function. Define a mapping s : G(£) — N 
by 
s(A) = |A| —r(£)4+7r(E —- A). 


Then the pair (E, s) is a matroid (called the dual of (E, r). 


We leave the proof as an exercise. Also, it is easy to verify that the dual of the dual is the 
original matroid. A circuit in (E, s) is also referred to as a cocircuit in (E,r). There is a 
notion of cobasis also, and cospan. 


If the dual of E is graphic, E is called cographic. This notion of duality agrees with the 
notion of same name in the [theory] of planar graphs] (and likewise in {linear algebra): given 
a plane graph, the dual of its matroid is the matroid of the dual graph. A matroid that is 
both graphic and cographic is called planar, and various criteria for planarity of a graph can 
be extended to matroids. The notion of orientability can also be extended from graphs to 
matroids. 
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49.1.4 Binary matroids 


A matroid is said to be binary if it is representable over the field of two elements. There are 
several other (equivalent) characterisations of a binary matroid (Er), such as: 


- The symmetric — difference] of any family of — circuits is the |union|— of a family of pairwise 


disjoint circuits. 
— For any circuit C and cocircuit D, we have |C() D| <0 (mod 2). 


Any graphic matroid is binary. The dual of a binary matroid is binary. 


49.1.5 Miscellaneous 


The definition of the chromatic polynomial of a graph, 
xla) = 0-1) Farr), 


FCE 


extends without change to any matroid. This polynomial has something to say about the 
decomposibility of matroids into simpler ones. 


Also on the topic of decomposibility, matroids have a sort of structure theory, in terms of 
what are called{minors|and separators. That theory, due to Tutte, goes by induction} roughly 
speaking, it is an adaptation of the old algorithms for putting almatrix into a canonical form. 


Along the same lines are several theorems on “basis exchange”, such as the following. Let 
E be a matroid and let 
A = {61;24+40n} 


B = {b1,..-, On} 
be two (equipotent) [bases of Æ. There exists a w of the set {1,...,n} such 


that, for every m from 0 to n, 


{ai, m, bypim+1)» nics , byn) } 


is a basis of E. 


49.1.6 Further reading 


A good textbook is: 
James G. Oxley, Matroid Theory, Oxford University Press, New York etc., 1992 


plus the updates-and-errata file at Dr. Oxley’s [website] 
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The chromatic polynomial is not discussed in Oxley, but see e.g. 


Version: 3 Owner: drini Author(s): Larry Hammick, NeuRet 


49.2 polymatroid 


The polymatroid defined by a given |matroid](£,r) is the set of all {functions| w: E — R 
such that 
w(e) >0 for alle € E 


X w(e)<r(S) forall SCE. 


Polymatroids are related to the convex polytopes seen in and have 


similar uses. 


Version: 1 Owner: nobody Author(s): Larry Hammick 
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Chapter 50 


05C05 — Trees 


50.1 AVL tree 


An AVL tree is A balanced binary search where the height of the two subtrees (children) 
of a [node] differs by at most one. Look-up, insertion, and deletion are O(Inn), where n is 
the number of nodes in the tree. 


The structure] is named for the inventors, Adelson-Velskii and Landis (1962). 


Version: 5 Owner: Thomas Heye Author(s): Thomas Heye 


50.2 Aronszajn tree 


A «-tree T for which |T,| < « for all a < «k and which has no is called a 
«-Aronszajn If k = w then it is referred to simply as an Aronszajn tree. 


If there are no k-Aronszajn trees for some « then we say « has the tree property. w has 


the tree property, but no|singular cardinal has the tree property. 
Version: 6 Owner: Henry Author(s): Henry 


50.3 Suslin tree 


An/Aronszajn tree|is a Suslin tree [iff it has no {uncountable antichains 


Version: 1 Owner: Henry Author(s): Henry 
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50.4 antichain 

A subset) A of a [poset] (P, <p) is an antichain if no two elements are comparable. That is, 
ifa,b€ Athena £p band b £p a. 

A maximal antichain of T is one which is 


In particular, if (P, <p) is altreel] then the maximal antichains are exactly those antichains 
which intersect every and if the tree is splitting then every is a maximal 


antichain. 


Version: 3 Owner: Henry Author(s): Henry 


50.5 balanced tree 


A balanced tree is a where each subtree of the root! has an equal number of 
(or as near as possible). For an example, see 


Version: 2 Owner: Logan Author(s): Logan 


50.6 binary tree 


A binary tree is a rooted tree] where every inode] has two or fewer A balanced 
binary tree is a binary tree that is also a balanced tree| For example, 


oO, 
JN, 2”. 


is a balanced binary tree. 


The two (potential) children of a node in a binary tree are often called the left and right 
children of that node. The left child of some node X and all that child’s| descendents are the 
left descendents of X. A similar definition applies to X’s right descendents. The left 
subtree of X is X’s left descendents, and the right subtree of X is its right descendents. 


Since we know the maximum number of children a binary tree node can have, we can make 
some statements regarding minimum and maximum depth of a binary tree as it relates to 
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the total number of nodes. The maximum depth of a binary tree of n nodes is n — 1 (every 
non-leaf node has exactly one child). The minimum depth of a binary tree of n nodes (n > 0) 
is [loga n| (every non-leaf node has exactly two children, that is, the {tree is balanced). 


A binary tree can be implicitly stored as an array, if we designate a maximum 
depth for the tree. We begin by storing the [root) node at |{index| 0 in the array. We then store 
its left child at index 1 and its right child at index 2. The children of the node at index 1 
are stored at indices 3 and 4, and the chldren of the node at index 2 are stored at indices 5 
and 6. This can be generalized as: if a node is stored at index k, then its left child is located 
at index 2k + 1 and its right child at 2k + 2. This form of implicit storage thus eliminates 
all overhead of the tree structure, but is only really advantageous for trees that tend to be 
balanced. For example, here is the implicit array representation| of the tree shown above. 


FAT BIE|C{D/F|G| 


Many data|structures| are binary trees. For instance, heaps and binary search trees are binary 
trees with particular properties 


Version: 3 Owner: Daume Author(s): Daume, Logan 


50.7 branch 


A bubset] B of a [tree] (T, <r) is a branch if B is a|maximall linearly ordered] subset of T. 
That is: 


e <r is a linear ordering of B 


e Ift €T \ B then BU {t} is not linearly ordered by <r. 


This is the same as the intuitive conception of a branch: it is a set of starting at the 
and going all the way to the tip (in infinite sets the conception is more complicated, 
since there may not be a tip, but the idea is the same). Since branches are maximal there is 
no way to add an element to a branch and have it remain a branch. 


A cofinal branch is a branch which intersects every level! of the tree. 


Version: 1 Owner: Henry Author(s): Henry 


50.8 child node (of a tree) 


A child node C of alnodel P in altreelis any node|connected to P which has a|path| distance! 
from the root) node R which is one greater than the path distance between P and R. 
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Drawn in the canonical root-at-top manner, a child node of a node P in a tree is simply any 
node immediately below P which is connected to it. 


Figure: A node (blue) and its (red.) 


Version: 1 Owner: akrowne Author(s): akrowne 


50.9 complete binary tree 
A complete binary tree is a with the additional property| that every 
must have exactly two “children” if an internal node\ and zero [children if a!leaf nodel 


More precisely: for our base case, the complete binary tree of exactly one node is simply 
the consisting of that node by itself. The property of being “complete” is preserved 
if, at each step, we expand the tree by connecting exactly zero or two individual nodes (or 
complete trees) to any node in the tree (but both must be [connected] to the same 


node.) 


Version: 4 Owner: akrowne Author(s): akrowne 


50.10 digital search tree 
A digital search tree is a [tree] which stores [strings] internally so that there is no need for 
extra to store the strings. 


Version: 5 Owner: Logan Author(s): Logan 
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50.11 digital tree 


A digital tree is altreelfor storing a set of where Inodes| are organized by 


common to two or more strings. Examples of digital trees are digital search trees and tries. 


Version: 3 Owner: Logan Author(s): Logan 


50.12 example of Aronszajn tree 


Construction 1: If « is a then there is a construction of a K- 
Aronszajn {tree| Let (kg) gc, with 1 < k be a Sequence) |cofinal) in x. Then consider the tree 


where T = {(a,kg) | a<kg AB <} with (a1, kg,) <r (a2, kpa) iila < az and kg, = kpa. 


Note that this is similar to (indeed, a subtree of) the construction given for a tree with no 


It consists of + disjoint/branches, with the 3-th branch of kg. Since 


L< K, every has fewer than « elements, and since the sequence is cofinal in «, T must 


have height and [cardinality] «. 
Construction 2: We can construct an [Aronszajn tree] out of the of QF. 


<r will be defined by x <r y iff y is an end-extension of x. That is, x C y andifrey\a 
and s € x then s < r. 


Let To = {[0]}. Given a level Ty, let T41 = {x£ U{g} | £ € Ta Aq > maxz}. That is, for 
every element x in Ta and every [rational number) q larger than any element of x, xJ{q} is 
an element of Ta41. If a < w is allimit ordinal] then each element of T, is the funion| of some 
branch in T(a). 


We can show by |induction|that ITa] < w: for each a < w. For thelbaselcase, Ty has only one 
element. If |Ta] < w then |Ta41| = Tal- Q| = [Tal -w =w < wi. If a < wis a limit ordinal 
then T(a) is a countable) union of countable sets, and therefore itself countable. Therefore 
there are a countable number of branches, so Tą is also countable. So T has countable levels. 


Suppose T has an {uncountable} branch, B = (bo, b,,...). Then for any i < j < wy, bi C bj. 
Then for each i, there is some x; € 0:41 \ b; such that x; is greater than any element of 
bi. Then (zo, 21,...) is an uncountable increasing sequence of rational numbers. Since the 
rational numbers are countable, there is no such sequence, so T has no uncountable branch, 
and is therefore Aronszajn. 


Version: 1 Owner: Henry Author(s): Henry 
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50.13 example of tree (set theoretic) 


The set Z* is a tree) with <p=<. This isn’t a very interesting tree, since it simply consists 
of a line of nodes} However note that the [height] is w even though no particular node has 
that height. 


A more interesting tree using Z+ defines m <r n if it = m and i? = n for some i,a,b € 
Z* \J{0}. Then 1 is the root\ and all numbers which are not [powers] of another number are 
in Tı. Then all {squares} (which are not also fourth powers) for T, and so on. 


To illustrate the concept of a|cofinal branch] observe that for any k we can 
construct a «-tree which has no cofinal branches. We let T = {(a,Z)|a < 6 < k} and 


(a1, 61) <r (Q2, B2) @ a1 < ag A Bı = b2. The tree then has k (disjoint) branches, each 
consisting of the set {(a, 3)|a < 3} for some 3 < «K. No branch is cofinal, since each branch 
is capped at 8 elements, but for any y < «x, there is a branch of height y + 1. Hence the 
suprememum of the heights is x. 


Version: 1 Owner: Henry Author(s): Henry 


50.14 extended binary tree 


An extended binary tree is a|transformation|of any/binary tree|into alcomplete binary tree 
This transformation consists of replacing every {null subtree of the original tree| with “special 


nodes.” The{nodes|from the original tree are then internal nodes, while the “special nodes” 
are external nodes. 


For instance, consider the following binary tree. 


The following tree is its extended binary tree. Empty internal nodes, and 
filled circles represent external nodes. 


Every internal node in the extended tree has exactly two [children] and every external node 
is alleaf, The result is a complete binary tree. 


Version: 4 Owner: Logan Author(s): Logan 
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50.15 external path length 


Given a T, construct its extended binary tree 7”. The external path length 
of T is then defined to be the sum of the lengths of the paths} to each of the {external nodes} 


For example, let T be the following {tree} 


The extended binary tree of T is 


The external path length of T (denoted E) is 


h=24+34+34+34+34+3+4+3= 20 


The internal path length of T is defined to be the sum of the lengths of the paths to each 
of the [internal nodes! The internal path length of our example tree (denoted T) is 


I=1+2+0+2+1+2=8 


Note that in this case FE = J + 2n, where n is the number of internal nodes. This happens 
to hold for all binary) trees. 


Version: 1 Owner: Logan Author(s): Logan 
50.16 internal node (of a tree) 


An internal node of altreelis any node which has degree| greater than one. Or, phrased in 
[rooted tree] terminology, the internal nodes of a tree are the nodes which have at least one 
child node 
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a, 
ae 
Pa 


Figure: A tree with internal nodes highlighted in red. 


Version: 3 Owner: akrowne Author(s): akrowne 


50.17 leaf node (of a tree) 


A leaf of altreelis any [node] which has En exactly 1. Put another way, a leaf node of 
alrooted tree is any node which has no child nodes| 


Z/N, 
JN. 
va 


Figure: A tree with leaf nodes highlighted in red. 


Version: 2 Owner: akrowne Author(s): akrowne 


50.18 parent node (in a tree) 


A parent Inode] P of a node C in altree is the first node which lies along the path] from C 
to the root) of the tree, R. 


Drawn in the canonical root-at-top manner, the parent node of a node C in a tree is simply 


the node immediately above C which is connected to it. 
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Figure: A node (blue) and its parent (red.) 


Version: 2 Owner: akrowne Author(s): akrowne 


50.19 proof that w has the tree property 


Let T bea with and an number of elements. Then consider the 
elements of Tọ. T can be partitioned into the set of of each of these elements, 
and since any finite |partition of an infinite set has at least one infinite partition, some element 
Xo in To has an infinite number of descendants. The same procedure can be applied to the 
lchildren| of zo to give an element xı € T, which has an infinite number of descendants, and 
then to the children of zı, and so on. This gives ajsequence) X = (xo, 71,...). The sequence 
is infinite since each element has an infinite number of descendants, and since x;,, is always 


of of xi, X is albranch\ and therefore an infinite branch of T. 


Version: 2 Owner: Henry Author(s): Henry 


50.20 root (of a tree) 


The root of altreelis a place-holder node. It is typically drawn at the top of the page, with 
the other nodes below (with all nodes having the same path|\distance] from the root at the 


same height) 
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Figure: A tree with root highlighted in red. 


Any tree can be redrawn this way, selecting any node as the root. This is important to 
note: taken as a in general, the notion of “root” is meaningless. We introduce a root 
explicitly when we begin speaking of a graph as a tree- there is nothing in general that 
selects a root for us. 


However, there are some special cases of trees where the root can be distinguished from the 
other nodes implicitly due to the of the tree. For instance, a root is uniquely 
identifiable in a complete binary tree, where it is the only node with two. 


Version: 4 Owner: akrowne Author(s): akrowne 


50.21 tree 


Formally, a forest is an undirected, A forest consists of trees, which are 
themselves acyclic, For example, the following diagram represents) a forest, 
each connected component) of which is a tree. 


All trees are forests, but not all forests are trees. As in a\graph| a forest is made up of 
(which are often called nodes] interchangeably) and \edges, Like any graph, the vertices and 
edges may each be labelled — that is, associated with some atom of data. Therefore a forest 
or a tree is often used as a data [structurel 


Often a particular node of a tree is specified as the|root| Such trees are typically drawn with 
the root at the top of the diagram, with all other nodes depending down from it (however 
this is not always the case). A tree where a root has been specified is called a rooted tree. A 
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tree where no root has been specified is called a free tree. When speaking of|tree traversals| 
and most especially of trees as datastructures, rooted trees are often implied. 


The edges of a rooted tree are often treated as directed. In a rooted tree, every non-root 
node has exactly one edge that leads to the root. This edge can be thought of as connecting 
each node to its parent, Often rooted trees ae considered directed in the sense that all edges 
connect parents to their [children] but not vice-versa. Given this parent-child relationship, a 
[descendant] of a node in a directed tree is defined as any other node reachable from that 
node (that is, a node’s children and all their descendants). 


Given this directed notion of a rooted tree, a rooted subtree can be defined as any node 
of a tree and all of its descendants. This notion of a rooted subtree is very useful in dealing 
with trees inductively and defining certain algorithms inductively. 


Because of their simple) structure and unique properties, trees and forests have many uses. 


Because of the simple definition of various tree traversals, they are often used to store and 
lookup data. Many algorithms are based upon trees, or depend upon a tree in some manner, 


such as the heapsort| algorithm or {Huffman encoding, There are also a great many specific 


forms and families of trees, each with its own constraints, strengths, and weaknesses. 


Version: 6 Owner: Logan Author(s): Logan 


50.22 weight-balanced binary trees are ultrametric 


Let X be the set of in a weight-balanced Let the distance] between 


leaf nodes be identified with the weighted path length between them. We will show that this 
distance metric|on X is ultrametric] 


Before we begin, let the of any two nodes! x, y, denoted x V y, be defined as the node 
z which is the most immediate common [ancestor] of x and y (that is, the common ancestor 
which is farthest from the root). Also, we are using weight-balanced in the sense that 


e the weighted path length from the root to each leaf node is equal, and 
e each subtree is weight-balanced, too. 
Lemma: two properties of weight-balanced trees 


Because the [tree] is weight-balanced, the distances between any node and each of the leaf 
node descendents of that node are equal. So, for any leaf nodes x,y, 


d(x,z V y) = d(y,z V y) (50.22.1) 
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Hence, 
d(x,y) = d(x, x V y) + d(y,x V y) =2*d(z,2 V y) (50.22.2) 


Back to the main proof 


We will now show that the ultrametric three point condition holds for any three leaf nodes 
in a weight-balanced binary tree. 


Consider any three points a, b, c in a weight-balanced binary tree. If d(a,b) = d(b, c) = d(a,c), 
then the three point condition holds. Now assume this is not the case. Without loss of 
generality, assume that d(a,b) < d(a,c). 


Applying Eqn. [50.22.2] 


2xdļa,aVb) < 2xd(a,aVc) 
d(la,a Vb) < dla,aVe) 


Note that both a V b and a V c are ancestors of a. Hence, a V c is a more distant ancestor of 
a and so must a V c must be an ancestor of a V b. 


Now, consider the between b and c. to get from b to c is to go from b up to a V b, then 
up to a V c, and then down to c. Since this is a tree, this is the only path. The highest node 
in this path (the ancestor of both b and c) was a V c, so the distance d(b,c) = 2 x d(b, a V c). 


But by Eqn. 0.22.7) and Eqn. [50.22.2] (noting that b is a descendent of a V c), we have 


d(b,c) = 2 x d(b,a Vc) = 2 x d(a,a V c) = dla, c) 


To summarize, we have d(a,b) < d(b,c) = d(a,c), which is the desired ultrametric three 
point condition. So we are done. 


Note that this means that, if a,b are leaf nodes, and you are at a node outside the subtree 
under a V b, then d(you, a) = d(you, b). In other words, (from the point of view of distance 
between you and them,) the structure of any subtree that is not your own doesn’t matter to 
you. This is expressed in the three point condition as ”if two points are closer to each other 
than they are to you, then their distance to you is equal”. 


(above, we have only proved this if you are at a leaf node, but it works for any node which is 
outside the subtree under a V b, because the paths to a and b must both pass through a V b). 
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50.23 weighted path length 


Given an extended binary tree) T (that is, simply any complete binary tree| where [leafs] are 
denoted as external nodes), weights with each external node. The weighted 


path length of T is the sum of the product of the weight and path length of each external 
node, over all external nodes. 


Another formulation is that weighted path length is }> wl; over all external nodes j, where 
w; is the weight of an external node j, and l; is the distance) from the [root] of the |tree] to j. 
If w; = 1 for all 7, then weighted path length is exactly the same as external path lengt 


Example 


Let T be the following extended binary tree. are external nodes, and circular 
nodes are internal nodes, Values in external nodes indicate weights, which are given in this 
problem, while values in internal nodes|represent|the weighted path length of subtrees rooted 
at those nodes, and are calculated from the given weights and the given tree. The weight of 
the tree as a whole is given at the root of the tree. 


This tree happens to give the minimum weighted path length for this particular set of 


weights. 
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Chapter 51 


05C10 — Topological graph theory, 
imbedding 


51.1 Heawood number 


The Heawood number of a surface is the/maximal! number of colors!needed to color any graph 
embedded in the surface. For example, that Heawood number 
of the |sphere| is four. 


In 1890 Heawood proved for all surfaces except sphere that the Heawood number is 


HIS) < zZ 19 =o 
~ 2 3’ 


where e(S) is the of the surface. 


Later it was proved in the works of Franklin, Ringel and Youngs that 


TN = 19 H 
ae 2 š 


For example, the complete graphļ on 7 {vertices| can be embedded in [torus] as follows: 
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51.2 Kuratowski’s theorem 


A is if and only if it no that is to or is 
a of K; or K33, where Ks; is the complete grap of [order] 5 and K33 is the 
complete bipartite graph of order 6. Wagner’s theorem is an later result. 


REFERENCES 
1. Kazimierz Kuratowski. Sur le probleme des courbes gauches en topologie. Fund. Math., 15:271- 


283, 1930. 
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51.3 Szemerédi-Trotter theorem 


The number of incidences of a set of n points and a set of m lines in the leall plane R? is 
IT=O(n+m+ (nm)?). 


Proof. Let’s consider the points as|yertices| of a/graph, and connect two vertices by an [edge] 
if they are [adjacent] on some line. Then the number of edges is e = I — m. If e < 4n then 


we are done. If e > 4n then by [crossing lemma] 


64 n? 
and the theorem follows. 


Recently, Tóth[I] extended the theorem to the complex plane|C?. The proof is difficult. 
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51.4 crossing lemma 


The [rosing number] of a graph @ with n [ertieesland m > 4n fede) 


1 m’ 
r iar 
cr(G) > BA 2 
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51.5 crossing number 
The crossing number cr(G) of a graph|G is the minimal number of crossings among all 


of G in the plane. 
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51.6 graph topology 


Algraph|(V, E) is identified by its|vertices|V = {v1, v2, ...} and its\edges| E = {{v;, vj}, {ug, vi}, -. 


A graph also admits a natural topology, called the graph topology, by identifying every 
edge {v;,v;} with the lunitlintervall J = [0,1] and gluing them together at coincident vetices. 


This construction can be easily realized in the framework of |simplicial complexes., We can 
form a simplicial complex G = {{v}|ue@V}UE. And the desired topological 


of the graph is just the geometric realization |G| of G. 


Viewing a graph as a topological space has several advantages: 


e The notion of graph isomorphism simply becomes that of 
e The notion of a coincides with topological |conectedness) 
e A connected graph is a [treeliff its fundamental group is trivial. 
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F 
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51.7 planar graph 


A planar graph is a/graph] which can be drawn on a plane 2-d surface) with no 


crossings. 


No complete graphs above Ky are planar. K4, drawn without crossings, looks like : 


A 
A 
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Hence it is planar (try this for Ks.) 


51.8 proof of crossing lemma 


Euler’s formula implies the linear lower bound cr(G’) > m — 3n +6, and so it cannot be used 


directly. What we need is to consider the subgraphs of our apply Euler’s formula on 
them, and then combine the estimates. The provides alnatural| way to 
do that. 


Consider a minimal |embedding]of G. Choose independently every vertex] of G with probabil- 
ity p. Let G, be a graphiinduced by those|vertices| By Euler’s formula, cr(Gp)—Mp+3np > 0. 
The [expectation] is leary 

E(cr(Gp) — Mp + 3Np) > 0: 
Since E(n,) = pn, E(m,) = p?m and E(X,) = p* cr(G), we get an [inequality] that [bounds] 
the [crossing number of G from below, 


er(G) > p °m — 3p 2n. 


Now set p = ts (which is at most 1 since m > 4n), and the inequaliy becomes 


Similarly, if m > 2n, then we can set p = os to get 
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Chapter 52 


05C12 — Distance in graphs 


52.1 Hamming distance 


In comparing two bit patterns, the Hamming distance is the count of bits different in the two 
patterns. More generally, if two ordered lists of items are compared, the Hamming distance 
is the number of items that do not identically agree. This |distancelis applicable to encoded 
and is a particularly of comparison, often more useful than the 
lcity-block distance] or [Buclidean distancal 


References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA. html) 
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Chapter 53 


05C15 — Coloring of graphs and 
hypergraphs 


53.1 bipartite graph 


A bipartite graph is a with alchromatic number of 2. 


The following graph, for example, is bipartite: 


A B 


\ 7 
4 
1 NI 


[One way] to think of a bipartite graph is by partitioning the [vertices] into two sets 
where vertices in one set are only to vertices in the other set. In the above graph, 
this may be more obvious with a different representation: 
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A——E 


PF B 
H D 
C=} 


The two are the two columns) of vertices, all of which have the same 


A graph is bipartite if and only if all its|cycles|have even length. This is easy to see intuitively: 
any |path of odd length on a bipartite must end on a|vertex| of the colour from the 


beginning vertex and hence cannot be a cycle. 
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53.2 chromatic number 


The chromatic number of a is the minimum number of [colours] required to colour 
it. 


Consider the following graph: 


A 


B C F 
D— E 
This graph has been coloured using 3 colours. Furthermore, it’s clear that it cannot be 


coloured with fewer than 3 colours, as well: it[containsla[subgraph] (BCD) that is|isomorphic| 
to the complete graphl of 3{vertices| As a result, the chromatic number of this graph is indeed 
3. 


This example was easy to solve by inspection. In general, however, finding the chromatic 
number of a large graph (and, similarly, an optimal colouring) is a very difficult (NP-hard) 


problem. 
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53.3 chromatic number and girth 


A famous theorem of P. Erdés pi 


Theorem 6. For any\natural numberdk and g, there exists a\graph\G with\chromatic_ number, 
x(G) > k and|girth girth(G) > g. 


we can easily have graphs with high chromatic numbers. For instance, the 
Set ao complete graph Kn trivially has y(K,) = n; however girth(K,,) = 3 (for n > 3). And 
the cycle graph Cn has girth(C,,) = n, but 


1 n=1 
x(Cn) = <2 n even 


3 otherwise. 


It seems intuitively plausible that a high chromatic number occurs because of short, “local” 
cycles in the graph; it is hard to envisage how a graph with no short cycles can still have 
high chromatic number. 


Instead of envisaging, Erdos’ proof shows that, in some appropriately chosen probability space 
on graphs with n|vertices, the probability of choosing a graph which does not have x(G) > k 


and girth(G) > g tends to zero as n grows. In particular, the desired graphs exist. 


This seminal paper is probably the most famous application of the probabilistic method) and 
is regarded by some as the foundation of the method Today the probabilistic method is 
a standard tool for combinatorics. More constructive methods are often preferred, but are 
almost always much harder. 
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53.4 chromatic polynomial 


Let G be a\graph| (in the sense of graph theory) whose set V of verticeslis|finiteland nonempty, 
and which has no loops|or multiple edges, For any|natural number, let x(G, x), or just x(x), 
denote the number of x-colorations of G, i.e. the number of|mappings| f: V —> {1,2,...,2} 
such that f(a) # f(b) for any pair (a,b) of [adjacent] vertices. Let us prove that x (which 
is called the chromatic polynomial of the graph G) is a [polynomial] function) in x with 
coefficients in Z. Write E for the set of edges in G. If |E|=0, then trivially x(x) = 2!Y! 
(where |- | denotes the number of elements of a finite set). If not, then we choose an edge e 


1 See the very readable P. Erdés, Graph theory and probability, Canad J. Math. 11 (1959), 34-38. 

? However, as always, with the benefit of hindsight we can see that the probabilistic method had been used 
before, e.g. in various applications of This does nothing to diminish from the importance 
of the clear statement of the tool. 
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and construct two graphs having fewer edges than G: H is obtained from G by contracting 
the edge e, and K is obtained from G by omitting the edge e. We have 


x(G, x) = x(K, x) — x(H, x) (53.4.1) 


for all x € N, because the polynomial x(K, x) is the number of colorations of the vertices of 
G which might or might not be valid for the edge e, while y(H, x) is the number which are 
not valid. By \induction| on |E|, (63-41) shows that x(G, x) is a polynomial over Z. 


By refining the argument a little, one can show 
x(a) = zV! — |E] VE +... sat , 


for some nonzero integer| s, where k is the number of connected components of G, and the 


coefficients alternate in sign. 


With the help of the Mébius-Rota inversion [formula] (see (Moebius inversion), or directly by 
induction, one can prove 
x(x) = ea ae 
FCE 


where the sum is over all subsets) F of E, and r(F) denotes the rank of F in G, ie. the 
number of elements of any [maximal] cycle-free subset of F. (Alternatively, the sum may be 
taken only over subsets F' such that F' is equal to the {spanj of F'; all other summands cancel 
out in pairs.) 


The [chromatic number) of G is the smallest x > 0 such that x(G, x) > 0 or, equivalently, 
such that x(G, x) # 0. 


The Tutte polynomial of a graph, or more generally of a[matroid] (F,r), is this function 


of two variables: 
tey) = D e- OO- YM, 
FCE 


Compared to the chromatic polynomial, the Tutte contains more information about the 
matroid. Still, two or more nonisomorphic matroids may have the same Tutte polynomial. 
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53.5 colouring problem 


The colouring problem is to assign a colour to every vertex] of a such that no two 
have the same colour. These colours, of course, are not necessarily colours 
in the optic sense. 


Consider the following graph: 
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One potential [colouring] of this graph is: 


A 


B C F 
D— E 
A and C have the same colour; B and E have a second colour; and D and F have another. 


Graph colouring problems have many applications in such situations as scheduling and 


problems. 
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53.6 complete bipartite graph 


The complete bipartite graph Kn,m is a with two sets of [vertices| one with n 
members and one with m, such that each [vertexlin one set is adjacent) to every vertex in the 
other set and to no vertex in its own set. As the mame implies, K’,,,,, is 


Examples of complete bipartite graphs: 


Ko 5: 
C 
L 
A D 
E 
C F 
~ 
G 
K333: 
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A— D 


C — F 
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53.7 complete k-partite graph 


The [complete] k-partite graph Ka, ,a2...ar is a k-partite graph with a1, av.. . ap [verticeslof each 
[colour] wherein every [vertex] is [adjacent] to every other vertex with a different colour and to 


no vertices with the same colour. 


For example, the 3-partite complete graph [2 3,4: 
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53.8 four-color conjecture 


The four-color conjecture was a long-standing problem posed by Guthrie while a 
mapjof England. The conjecture|states|that every map on a plane or a|sphere|can be colored 
using only four |colors| such that no two adjacent) countries are assigned the same color. This 
is equivalent] to the statement that [chromatic numberof every planar graphjis no more than 
four. After many unsuccessfull attempts the conjecture was proven by Appel and Haken in 
1976 with an aid of computer. 
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Interestingly, the seemingly harder problem of determining the number of colors 
needed for all surfaces other than the sphere was solved long before the four-color conjecture 


was settled. This number is now called the [Heawood number of the surface. 


REFERENCES 
1. Thomas L. Saaty and Paul C. Kainen. The Four-Color Problem: Assaults and Conquest. Dover, 
1986. 
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53.9 k-partite graph 


A k-partite [graph] is a graph with a [chromatic number|of k. 


An alternate definition of a k-partite graph is a graph where the vertices) are partitioned into 
k [subsets] with the following conditions: 


1. No two vertices in the same subset are 1. 
2. There is no of 2. the vertices with fewer than k subsets where condition 1 holds. 
These two definitions are equivalent} Informally, we see that a [colour] can be assigned to all 


the vertices in each subset, since they are not adjacent to one another. Furthermore, this is 
also an optimal [colouring] since the second condition holds. 


An example of a 4-partite graph: 


A G 


ZA, 


` ye 


D F 


A 2-partite graph is also called a bipartite graph 


Version: 5 Owner: vampyr Author(s): vampyr 


379 


53.10 property B 


A hypergraphiG is said to possess property B if it 2-colorable, i.e., its'vertices|can be colored 
in two |colors| so that no [edge] of G is monochromatic. 
The {property was named after Felix Bernstein by E. W. Miller. 
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Chapter 54 


05C20 — Directed graphs (digraphs), 
tournaments 


54.1 cut 


On a|digraph, define a/sink|to be alvertex| with out-degree zero and a/SOUFCEé) to be a vertex 
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one 
sink and exactly one source. A cut C on G is a [subset] of the such that every path 
from the source to the sink passes through an edge in C. In other words, if we remove every 
edge in C from the [graph] there is no longer a path from the source to the sink. 


Define the weight of C as 


We = >> Wee) 


e€C 
where W (e) is the weight of the edge e. 


Observe that we may achieve a trivial cut by removing all the edges of G. Typically, we are 
more interested in minimal cuts, where the weight of the cut is minimized for a particular 
graph. 
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54.2 de Bruijn digraph 


The|vertices|of the de Bruijn digraph B(n, m) are all possible words of length m—1 chosen 
from an of bizeln. 


B(n,m) has n” \edges| consisting of each possible word of length m from an alphabet of size 
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n. The edge ajag...a, connects the [vertex] a,a2...dn_, to the vertex asas... an. 


For example, B(2,4) could be drawn as: 


0000 
000 

a 0001 
1001 


010 
1100| 1010 0101 | 0011 
101 


110 O11 
a Fai 
111 
S) 

1111 


Notice that an [Euler cycle| on B(n,m) represents] a shortest pequence] of [lcharacters! from an 
alphabet of size n that includes every possible subsequence) of m characters. For example, 
the sequence 000011110010101000 includes all 4-bit subsequences. Any de Bruijn digraph 
must have an Euler cycle, since each vertex has in [degree] and out degree of m. 
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54.3 directed graph 


A directed graph or digraph is a pair (V, Æ) where V is a set of and EF is a 
lsubset| of V x V called [edges] or arcs] 


If E is [symmetric] (i.e., (u,v) € E if and only if (v, u) € E), then the digraph is [isomorphic] 
to an ordinary (that is, undirected) graph 


Digraphs are generally drawn in a similar manner to graphs with on the edges to 
indicate a sense of direction. For example, the digraph 


({a, b, c, d}, {(a, b), (b, d), (b, c), (c, b), (c, c), (c, d)}) 


may be drawn as 


382 


a 


Q=<—(o 


K 


C 
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54.4 flow 


On a|digraph] define a sink to be al[vertex]with out-degree zero and a source to be a vertex 
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one 
sink and exactly one source. A flow on G is an assignment f : E(G) — R of values to each 


edge] of G satisfying certain rules: 


1. For any edge e, we must have 0 < f(e) < W(e) (where W (e) is the weight of e). 


2. For any vertex v, excluding the source and the sink, let Ej, be the set of edges incident! 
to v and let Eou be the set of edges incident from v. Then we must have 


To= E FO. 


e€Ein e€ Hout 


Let Esource be the edges incident from the source, and let Esing be the set of edges incident 
to the sink. If f is a flow, then 


»~ fO= dO fe. 


e€E sink e€ E source 


We will refer to this quantity as the amount of flow. 


Note that a flow given by f(e) = 0 trivially |satisfies| these conditions. We are typically more 
interested in maximum flows, where the amount of flow is maximized for a particular 


We may interpret a flow as a means of transmitting something through a network. Suppose 
we think of the edges in a graph as pipes, with the weights corresponding with the capacities 
of the pipes; we are pouring water into the system through the source and draining it through 
the sink. Then the first rule requires that we do not pump more water through a pipe than 
is possible, and the second rule requires that any water entering a junction of pipes must 
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leave. Under this interpretation, the maximum amount of flow corresponds to the maximum 
amount of water we could pump through this network. 


Instead of water in pipes, one may think of electric charge in a network of conductors. Rule 
(2) above is one of Kirchoff’s two laws for such networks; the other says that the sum of the 
voltage drops around any circuit is zero. 
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54.5 maximum flow/minimum cut theorem 


Let G be a finite) digraph] with nonnegative weights and with exactly one and exactly 
one source) Then 


I) For any {flow f on G and any |cut|C of G, the amount of flow for f is less than or equal to 
the weight of C. 


II) There exists a flow fọ on G and a cut Co of G such that the flow of fp equals the weight 
of Co. 


Proof:(I) is easy, so we prove only (II). Write R for the set of nonnegative 
Let V be the set of vertices] of G. Define a [matrix] 


K:VxV—R 


where x(x, y) is the sum of the weights (or capacities) of all the directed edges] from to y. 
By hypothesis there is a unique v € V (the source) such that 


K(z,v)=0 VaeeVv 
and a unique w € V (the sink) such that 
K(w,x)=0 VWreVv. 


We may also assume K(x, x) = 0 for all x € V. Any flow f will correspond uniquely (see 
Remark below) to a matrix 


p:VxV—R 


such that 


plz,y) < K(z,y) Va,yeV 

Si ¢(2,z) = J es) ve xv,w 
Let À be the matrix of any [maximal] flow, and let A be the set of x € V such that there 
exists a finite Sequence] xro = v, £1,- .., En = x such that for all m from 1 to n — 1, we have 


either 
Al Ging Ome) < Bens es) (54.5.1) 
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or 
Asa ti) > 0: (54.5.2) 


Write B = V — A. 


Trivially, v € A. Let us show that w € B. Arguing by contradiction, suppose w € A, and let 
(£m) be a sequence from v to w with the properties] we just mentioned. Take a real number 
€ > 0 such that 

E€ + A(Gm;Lm41) < (Lm; Lm41) 


for all the (finitely many) m for which (38.1.1) holds, and such that 
Al Sine tm) > € 


for all the m for which (64.5.2) holds. But now we can define a matrix p with a larger flow 
than A (larger by €) by: 


Hes Tara) = € + AGA Sis) if (38.1.1) holds 


UlEm+1;, £m) = A Deu £m) — € if (545.2) holds 
ula, b) = A(a, b) for all other pairs (a, b) . 


This contradiction shows that w € B. 


Now consider the set C of pairs (x,y) of vertices such that x € V and y € W. Since W is 
nonempty, C is a cut. But also, for any (x,y) € C we have 


Nx, y) = K(x, y) (54.5.3) 


for otherwise we would have y € V. Summing (54.5.3) over C, we see that the amount of 
the flow f is the capacity of C, [QED] 


Remark: We expressed the proof rather informally, because the terminology of graph theory 
is not very well standardized and cannot all be found yet here at PlanetMath. Please feel 
free to suggest any revision you think worthwhile. 
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54.6 tournament 


A tournament is a directed obtained by choosing a direction for each in an 
undirected complete graph, For example, here is a tournament on 4 vertices! 


x<] 
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Any tournament on a/finite| number n of vertices (contains) a Hamiltonian path] i.e., directed 
path on all n vertices. This is easily shown by induction] on n: suppose that the statement 
holds for n, and consider any tournament T on n + 1 vertices. Choose a [vertex] vg of T and 
consider a directed path v1, v2,...,Un in T \ {vo}. Now let i € {0,...,n} be maximal] such 
that v; — vo for all j with 1 < j <7. Then 


U1, +++ 5 Ui, U0, Vip; -3 Un 
is a directed path as desired. 


The “tournament” originates from such a graph’s as the of 
some sports competition in which every player| encounters every other player exactly once, 
and in which no draws occur; let us say that an [arrow] points from the winner to the loser. 
A player who wins all {games| would naturally be the tournament’s winner. However, as the 
above example shows, there might not be such a player; a tournament for which there isn’t 
is called a 1-paradozical tournament. More generally, a tournament T = (V, E) is called 
k-paradozxical if for every k-subset V’ of V there is a vo € V \ V’ such that vo — v for all 


v € V’. By means of the probabilistic method Erdös showed that if |V| is sufficiently large, 


then almost every tournament on V is k-paradoxical. 
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Chapter 55 


05C25 — Graphs and groups 


55.1 Cayley graph 

Let G = (X|R) be a [presentation] of the G with \generators| X and 

relations| R. We define the Cayley graph r =T(G, X) of G with generators X as 
[=(G,E£), 


where 
E={{u,a-u}|ueG,ae X}. 


That is, the [vertices] of the Cayley graph are precisely the elements of G, and two elements 
of G are{connected| by an some generator in X transfers the one to the other. 


Examples 


1. G = Z$, with generators X = {e,,...,eg}, the standard Then I'(G, X) 
is the d-dimensional grid} confusingly, it too is often termed “Z®”. 


2. G = Fy, the [free group] with the d generators X = {1,..., ga}. Then T(G, X) is the 
2d-regular {tree} 
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Chapter 56 


05C38 — Paths and cycles 


56.1 Euler path 


An Euler path along a|\connected graph) with n vertices] is a connecting all n vertices, 
and traversing every ledge) of the only once. Note that a [vertex] with an odd 


allows one to traverse through it and return by another path at least once, while a vertex 
with an even degree only allows a number of traversals through, but one cannot end an Euler 
path at a vertex with even degree. Thus, a connected graph has an Euler path which is a 
circuit (an [Euler circuit) if all of its vertices have even degree. A connected graph has an 
Euler path which is non-circuituous if it has exactly two vertices with odd degree. 


This graph has an Euler path which is a circuit. All of its vertices are of even degree. 


This graph has an Euler path which is not a circuit. It has exactly two vertices of odd degree. 


Note that a graph must be connected to have an Euler path or circuit. A graph is connected 
if every pair of vertices u and z has a path uv,..., yz between them. 
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56.2 Veblen’s theorem 


The ledge] set of a can be partitioned) into if and only if every [vertex] has even 
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56.3 acyclic graph 


Any [graph] that [contains] no [cycles] is an acyclic graph. A directed acyclic graph is often 
called a DAG for short. 


For example, the following graph and [digraph] are acyclic. 


AN, ZA, 


In contrast, the following graph and digraph are not acyclic, because each contains a cycle. 


PAS ZN 
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56.4 bridges of Knigsberg 


The bridges of Königsberg is a famous problem inspired by an actual place and situation. 
The solution of the problem, put forth by Leonhard Euler in 1736, is the first work of 


graph theory and is responsible for the foundation of the discipline. 


The following figure shows a portion of the Prussian city of Königsberg. A river passes 
through the city, and there are two islands in the river. Seven bridges cross between the 
islands and the mainland: 


Figure 1: Map of the Königsberg bridges. 


The mathematical problem arose when citizens of Königsburg noticed that one could not 
take a stroll across all seven bridges, returning to the starting point, without crossing at 
least one bridge twice. 
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Answering the question of why this is the case required a mathematical [theory] that didn’t 
exist yet: graph theory. This was provided by Euler, in a paper which is still available today. 


To solve the problem, we must translate it into a graph-theoretic representation. We [model] 
the land masses, A, B, C and D, as ina The bridges between the land 
masses become This generates from the above picture the following graph: 


Figure 2: Graph-theoretic representation of the Kénigsburg bridges. 


At this point, we can apply what we know about ‘Euler paths) and [Euler circuits} Since an 
Euler circuit for a graph exists only if every |vertex|has an even degree, the Konigsberg graph 
must have no Euler circuit. Hence, we have explained why one cannot take a walk around 
Konigsberg and return to the starting point without crossing at least one bridge more than 
once. 
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56.5 cycle 
A cycle in a [graph] or multigraph, is from a [vertex] to itself (i.e., a 
path where the first vertex is the same as the last vertex and no |edge|is repeated). 
For example, consider this graph: 
A——B 
DC 
ABCDA and BDAB are two of the cycles in this graph. ABA is not a cycle, however, since 


it uses the edge connecting A and B twice. ABCD is not a cycle because it begins on A but 
ends on D. 


A cycle of length n is sometimes denoted C, and may be referred to as a polygon of n sides: 
that is, C3 is a C4 is a quadrilateral, Cs is a etc. 


An even cycle is one of even length; similarly, an odd cycle is one of odd length. 
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56.6 girth 


The girth of a|graph] G is the length of the shortest cycle] in ci 


For instance, the girth of any [grid] Z’ (where d > 2) is 4, and the girth of the [vertex] graph 
of the dodecahedron is 5. 
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56.7 path 


A path in a graphlis a finite sequence of alternating |vertices] and [edges] beginning and ending 
with a{vertex| v1e,U2e2V3...€n—1Un such that every consecutive pair of vertices vz and vz+4 
are [adjacent] and e, is|incident| with v, and with v,,1. Typically, the edges may be omitted 
when writing a path (e.g., vjvev3...Un) since only one edge of a graph may connect two 
adjacent vertices. In a [multigraph] however, the choice of edge may be significant. 


The length of a path is the number of edges in it. 


Consider the following graph: 
A——B 
D —— C 


Paths include (but are certainly not limited to) ABCD (length 3), ABCDA (length 4), and 
ABABABABADCBA (length 12). ABD is not a path since B is not adjacent to D. 


In a|digraph, each consecutive pair of vertices must belconnected by an edge with the proper 


orientation} if e = (u, v) is an edge, but (v, u) is not, then uev is a valid path but veu is not. 


Consider this digraph: 


<=> 


Țs<— QQ 


GHIJ, GJ, and GHGHGH are all valid paths. GH J is not a valid path because H and 
J are not connected. GJI is not a valid path because the edge connecting I to J has the 


<= 


1 There is no widespread agreement on the girth of a [forest] which has no cycles. It is also extremely 
unimportant. 
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orientation. 
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56.8 proof of Veblen’s theorem 


The proof is very easy by \induction|on the number of elements of the set E of If E is 
empty, then all the [vertices] have [degree] zero, which is even. Suppose E is nonempty. If the 
no [cycle] then some [vertex] has degree 1, which is odd. Finally, if the graph 
does contain a cycle C, then every vertex has the same degree mod 2 with respect to E—C, 
as it has with respect to E, and we can conclude by induction. 
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Chapter 57 


05C40 — Connectivity 


57.1 k-connected graph 


Fork eN,a G is k-connected [iff G has more than k vertices) and if the graph left by 
removing any k or less vertices is|connected| The largest linteger|k such that G is k-connected 
is called the connectivity of G and is denoted by «(G). 
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57.2 'Thomassen’s theorem on 3-connected graphs 


Every G with more than 4 |vertices| has an \edge|e such that G/e is also 


3-connected. 


Suppose such an edge doesn’t exist. Then, for every edge e = xy, the graph G/e isn’t 
3-connected and can be made by removing 2 vertices. Since «(G) > 3, our 
contracted [vertex] vs has to be one of these two. So for every edge e, G has a vertex z # x,y 
such that {v,y, z} separates G/e. Any 2 vertices |separated] by {v,,, z} in G/e are separated 
in G by S := {x,y,z}. Since the minimal size of a separating set is 3, every vertex in S has 


an adjacent, vertex in every [component] of G — S. 


Now we choose the edge e, the vertex z and the component C such that |C] is minimal. We 
also choose a vertex v adjacent to z in C. 


By construction G/zv is not 3-connected since removing xy disconnects C — v from G/zv. 
So there is a vertex w such that {z,v,w} separates G and as above every vertex in {z,v, w} 
has an adjacent vertex in every component of G — {z,v,w}. We now consider a component 
D of G—{z,v,w} that doesn’t {contain|z or y. Such a component exists since x and y belong 
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to the same component and G — {z,v,w} isn’t connected. Any vertex adjacent to v in D 
is also an element of C since v is an element of C. This means D is a of C 
which contradicts our assumption that |C| was minimal. 
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57.3 Tutte’s wheel theorem 


Every 3-connected|simple graph| can be constructed starting from a|wheel graph| by repeat- 
edly either adding an edge between two non-adjacent vertices] or splitting a [vertex] 
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57.4 connected graph 


A connected graph is a/graph| such that there exists a|path| between all pairs of 
If the graph is a (directed graph, and there exists a path from each to every other 


vertex, then it is a strongly connected graph. 


A connected component is alsubsetlof vertices of any graph and any edges| between them 
that forms a connected graph. Similarly, a strongly connected component is a subset 
of vertices of any digraph and any edges between them that forms a strongly connected 
graph. Any graph or digraph is a of connected or strongly connected components, 
plus some edges to the components together. Thus any graph can be decomposed 
into its connected or strongly connected components. For instance, Tarjan’s algorithm can 
decompose any digraph into its strongly connected components. 


For example, the following graph and digraph are connected and strongly connected, respec- 
tively. 


B——>C 
E F 

On the other hand, the following graph is not connected, and consists of the union of two 

connected components. 


D —— E —— F 


— 
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ABC 


D — E — F 


The following digraph is not strongly connected, because there is no way to reach F from 
other vertices, and there is no vertex reachable from C. 


The three strongly connected components of this graph are 


e 


C F 


R 
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57.5 cutvertex 
A cutvertex of a G is a [vertex] whose deletion increases the number of 


of G. The edge] analogue of a cutvertex is a 
Version: 2 Owner: digitalis Author(s): digitalis 
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Chapter 58 


05C45 — Eulerian and Hamiltonian 
graphs 


58.1 Bondy and Chvtal theorem 


Bondy and Chvatal’s theorem. 
Let G be a {graph of|order|n > 3 and suppose that u and v are distinct non 
such that deg(u) + deg(v) > n. 


Then G is Hamiltonian if and only if G + wv is hamiltonian. 
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58.2 Dirac theorem 


Theorem: Every graph| with n > 3ivertices|and minimum degree|at last 5 has a\/Hamiltonian cycle, 
Proof: Let G = (V, E) be a graph with ||G|| = n > 3 and 6(G) > 8. Then G is |connected} 
otherwise, the degree of any \vertex| in the smallest component C of G would be less then 
||C|| > Z. Let P = xo...x;, be a longest |path|in G. By the maximality of P, all the neighbours 
of zo and all the neighbours of x, lie on P. Hence at last 5 of the vertices £o, ..., -1 are 
ladjacent| to x, and at last 4 of these same k < n vertices x; are such that xox;4, € E. By the 
pigeon hole principle, there is a vertex x; that has both properties, so we have %p%j41 E€ E 
and 2,7, € E for some i < k. We claim that the|cycle|C = %9%j441Pxpx; Px is a Hamiltonian 
cycle of G. Indeed since G is connected, C would otherwise have a neighbour in G—C, which 


could be combined with a [spanning] path of C into a path longer than P. 
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58.3 Euler circuit 


An Euler circuit is a|connected graph such that starting at alvertex|a, one can traverse along 
every ledgeļ of the [graph] once to each of the other [vertices] and return to vertex a. In other 


words, an Euler circuit is an that is a circuit. Thus, using the of odd 
and even [degree] vertices given in the definition of an Euler path, an Euler circuit exists 


every vertex of the graph has an even [degree] 


This graph is an Euler circuit as all vertices have degree 2. 


This graph is not an Euler circuit. 
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58.4 Fleury’s algorithm 


Fleury’s algorithm constructs an [Euler circuit] in a{graph| (if it’s possible). 


1. Pick any [vertex] to start 


2. From that vertex pick an to traverse, considering following rule: never cross a 
bridge] of the reduced graph unless there is no other choice 


3. Darken that edge, as a reminder that you can’t traverse it again 
4. Travel that edge, coming to the next vertex 


5. Repeat 2-4 until all edges have been traversed, and you are back at the starting vertex 


By “reduced graph” we mean the original graph minus the darkened (already used) edges. 
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58.5 Hamiltonian cycle 
let G bea If there’s a|cycle] visiting all [vertices] exactly once, we say that the cycle is 
a hamiltonian cycle. 
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58.6 Hamiltonian graph 


Let G be a graph or digraph) 
If G has a we call G a hamiltonian graph. 


There is not useful necessary and sufficient condition for a graph being hamiltonian. However, 
we can get some necessary conditions from the definition like a hamiltonian graph is always 
lconnected| and has |lorder| at least 3. This an other observations lead to the condition: 


Let G = (V, E) be a graph of order at least 3. If G is hamiltonian, for every proper subset 
U of V, the by V — U has at most |U| components. 


For the sufficiency conditions, we get results like[Ore’s theorem|or/Bondy and Chvátal theorem 
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58.7 Hamiltonian path 
Let G be a [graph] A [path] on G that includes every [vertex] exactly once is called a hamil- 
tonian path. 
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58.8 Ore’s theorem 
Let G bea of [orderin > 3 such that, for every pair of distinct non u 
and v, deg(u) + deg(v) > n. Then G is a 
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58.9 Petersen graph 


Petersen’s graph. An example of that is but not That is, it 
has a Hamiltonian path) but doesn’t have a 


This is also the canonical example of a hypohamiltonian) graph. 
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58.10 hypohamiltonian 


A graph|G is hypohamiltonian if G is not [Hamiltonian] but G — v is Hamiltonian for each 


v € V (V the \vertex| set of G). The smallest hypohamiltonian graph is the Petersen graph 
which has ten [vertices] 
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58.11 traceable 


let G be a\graph\ If G has a Hamiltonian path, we say that G is traceable. 
Not every traceable graph is As an example consider Petersen’s grap 
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Chapter 59 


05C60 — Isomorphism problems 
(reconstruction conjecture, etc.) 


59.1 graph isomorphism 


A graph isomorphism is a [bijection] between the vertices! of two graphs G and H: 
f:V(G) -V(A) 


with the property that any two vertices u and v from G are if and only if f(u) and 
f(v) are adjacent in H. 


If an isomorphism can be constructed between two graphs, then we say those graphs are 
isomorphic. 


For example, consider these two graphs: 


a g 
b h 
(6 a 
d—— j 
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1 2 


i 


o —— on 
I——_ & 


Ta 


Although these graphs look very different at first, they are in fact isomorphic; one isomor- 
phism between them is 


= 
LSS BOR LS ZN 

À Ka; 
aat i Ting a Oa. Nee” 

II 
N A n a ou CO Oae 


= 

— 

GS 

> ol 
| 
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Chapter 60 


05C65 — Hypergraphs 


60.1 Steiner system 


A Steiner system S(t, k,n) is a k—uniform |hypergraph on n vertices) such that every set 
of t vertices is contained in exactly one Notice that S(2, k,n) are merely 2—uniform 


The families of hypergraphs S(2,3,n) are known as Steiner triple systems. 
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60.2 finite plane 


Let H = (V, £) be a linear space) A finite plane is an intersecting linear space. That is to 
say, a linear space in which any two |edges|in € have a nonempty lintersection| 


Finite planes are rather restrictive hypergraphs, and the following holds. 


Theorem 4. Let H = (V,€) be a finite plane. Then for some positive [integer k, H is 
(k+ 1)—regular, (k + 1)—uniform, and |E| = |V| =k? +k +1. 


The above k is the order of the finite plane. It is not known in general if finite planes exist 
of order other than k a{power|of a|prime| The terminology ” finite plane” is suggestive, as we 
can think of the edges as a (finite collection! of lines in in |General position| 
so that they all intersect pairwise in exactly one point. The added restriction that all pairs 
of determine an edge (i.e. “any two points determine a line”), however, makes it 
impossible to depict a finite plane in the Euclidean plane by means of straight lines, except 
for the trivial case k = 1. The finite plane of order 2 is known as the Fano plane. The 


following is a diagrammatic 
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An edge here is represented by a straight line, and the inscribed [circle is also an edge. In 
other words, for a{vertex|set {1, 2, 3, 4, 5,6,7}, the edges of the Fano plane are 


{1, 2,4} 
{2,3,5} 
{3,4,6} 
{4,5,7} 
{5,6,1} 
{6,7,2} 
{7, 1,3} 


Notice that the Fano plane is generated by the ordered triplet (1,2,4) and adding 1 to each 
entry, modulo 7. The generating triplet has the that the of any two 
elements, in either order, are all pairwise different modulo 7. In general, if we can find a set 
of k +1 elements of Z,24441 (the integers modulo k? +k + 1) with all pairwise differences 
distinct, then this gives a|cyclic| representation of the finite plane of k. 
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60.3 hypergraph 


A hypergraph 4 is an ordered pair] (V, €) where V is a set of [vertices] and € is a set of 


edges such that € C P(V). In other words, an edge is nothing more than a set of vertices. 


Sometimes it is desirable to restrict this definition more. The empty hypergraph is not very 
interesting, so we usually accept that V Æ 0. [Singleton] edges are allowed in general, but 
not the empty one. Most applications consider only finite] hypergraphs, but occasionally it 
is also useful to allow V to be infinite) 


Many of the definitions of carry verbatim to hypergraphs. H is said to be k-uniform 
if every edge e € E has [cardinality] k. The degree of a vertex v is the number of edges in 
E that lcontainlthis vertex, often denoted d(v). H is k-regular if every vertex has degree k. 
Notice that an ordinary graph is merely a 2—uniform hypergraph. 


Let V = {v1,v2, ..., Un} and E = {e1,€2, ... €m}. Associated to any hypergraph is the 
n x m incidence matrix A = (a;j) where 


lif y€ €j 
Qij = 
I 0 otherwise 


The [transpose] A* of the incidence matrix also defines a hypergraph H*, the dual of K, in 
an obvious manner. To be explicit, let H* = (V*, E*) where V* is an m—element set and €* 
is an n—element set of subsets! of V*. For vš € V* and ež € &, v Ee if and only if a;; = 1. 
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Notice, of course, that the dual of a uniform hypergraph is regular and vice-versa. It is not 
rare to see fruitful results emerge by considering the dual of a hypergraph. 
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60.4 linear space 


A hypergraph! H = (V,€) is a linear space if any pair of |vertices| is found in exactly one 
This usage of the [term] has no [relation] to its occasional appearance in linear algebra 


as a synonym for a vector space. 


The following two observations are often useful for linear spaces. Let n = |V| and pick some 
arbitrary v € V. 


| > -() 
j 2 


where the sum is taken over all edges containing v. 


The first [property] follows because every pair of vertices is contained in precisely one edge. 
For a fixed edge e, (5) counts all the pairs of vertices in e, and this summed over all edges 
gives all the possible pairs of vertices, which is o). The second one holds because given 
any [vertex], this vertex forms exactly one edge with every other vertex, so |e| — 1 counts the 
number of vertices v shares in vertex e, summed over all edges gives all the vertices except 


V. 
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Chapter 61 


05C69 — Dominating sets, 
independent sets, cliques 


61.1 Mantel’s theorem 


Every of lorder| n and izel greater than |n?/4| (contains) a (cycle of order 3). 
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61.2 clique 


A maximal of a [graph] is a clique, and the clique number w(G) of a 


graph G is the maximal order of a clique in G. Simply, w(G) is the maximal order of a 
complete subgraph of G. Some authors however define a clique as any complete subgraph of 
G and refer to the other definition as maximum clique. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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61.3 proof of Mantel’s theorem 


Let’s consider a G with n vertices] and no |triangles| and find a charaterization for this 


kind of graphs which make them distinct from those graphs which have n vertices and at 
least one triangle.Let alvertexļv; i = 1,2,..,n has his own weight w; such that Nw; = 1. Let 
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S=Lw;w; for any (i,j) in L(G). Now,take two vertices h ‚k not joined let x the total weight 
of the neighbors of k so y is for k and let’s assume x >= y.If we take a little portion of 
weight from the vertex k and we put this in the weight of h then ,this shifting won’t decrease 
S, so S would be [maximal] when all the weight is over a K> ,that is a complete) subgraph] 
made up of two vertices. Therefore,since S = vw and v + w = 1 at last w = v = 1/2 and 
S <= 1/4.The theorem is proven considering all the weights equal to 1/n.In fact in this case 
S =1/n?|E(G)| and |E(G)| <= n?/4. 
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Chapter 62 


05C70 — Factorization, matching, 
covering and packing 


62.1 Petersen theorem 


If G is a 3-regular 2-edge connected graph, then G has a complete matching 
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62.2 Tutte theorem 


Let G(V, E) be any (finite) graph\ G has a [complete matching} iff VX C V(G): œG- X) < 
|X|, where c,(H) is the number of the [components] of a H graph with an lodd number! of 


points. 
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62.3 bipartite matching 


A matching|on a|bipartite graphjis called a bipartite matching. Bipartite matchings have 
many interesting 


Matrix form Suppose we have a bipartite graph G and we the lvertices into two 
sets, V; and Vs, of the same We may then [represent] the with a simplified 
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adjacency with |V;| rows] and |V,| containing a 1 where an the 


corresponding vertices and a 0 where there is no edge. 


We say that two 1s in the matrix are line-independent if they are not in the same row or 
column. Then a matching on the graph with be a\subset! of the 1s in the matrix that are all 
line-independent. 


For example, consider this bipartite graph (the thickened edges are a matching): 


The graph could be represented as the matrix 


lx 1 0 1 0 
1 0 1x0 0 
0 0 1 0 Ix 
0 lx 1 0 1 


where a 1» indicates an edge in the matching. Note that all the starred 1s are line- 
independent. 


A complete matching on a bipartite graph G(V1, V2, Æ) is one that all of the 
vertices in Vj. 


Systems of distinct representatives A [system of distinct representatives) is equivalent] 
to a{maximal matching on some bipartite graph. Let Vi and V2 be the two sets of vertices in 
the graph with |V;| < |V2|. Consider the set {v € Vı : '(v)}, which includes the neighborhood] 
of every lvertex|in V;. An SDR for this set will be a unique choice of an vertex in Vz for each 
vertex in V,. There must be an edge joining these vertices; the set of all such edges forms a 
matching. 


Consider the sets 
Sy = {A, B, D} 


S = {4,C} 
S = {C,B} 
S4 = {B,C, E} 
One SDR for these sets is 
A E Sy 
C E Sy 
E € 83 
B Ee Sı 
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Note that this is the same matching on the graph shown above. 


Finding Bipartite Matchings One method for finding [maximal] bipartite matchings in- 
volves using a\{network flow] algorithm. Before using it, however, we must modify the graph. 


Start with a bipartite graph G. As usual, we consider the two sets of vertices V; and Vo. 
Replace every edge in the graph with a directed {arc from V, to V2 of capacity L, where L is 


some large 


Invent two new vertices: the Source) and the|sink, Add a directed arc of capacity 1 from the 
source to each vertex in Vi. Likewise, add a directed arc of capacity 1 from each vertex in 
Və to the sink. 


Now find the maximum flow from the source to the sink. The total weight of this flow will 
be the of the maximum matching on G. Similarly, the set of edges with non-zero flow will 
constitute a matching. 


There also exist algorithms specifically for finding bipartite matchings that avoid the over- 
head of setting up a weighted {digraph suitable for network flow. 
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62.4 edge covering 
Let G be agraph, An edge covering C on G is a\subset) of the |vertices) of G such that each 
edge) in G is incident! with at least one fvertex|in C. 


For any graph, the vertex set is a trivial edge covering. Generally, we are more interested 
in minimal coverings. A minimal edge covering is simply an edge covering of the least 
possible. 
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62.5 matching 
Let G be a [graph] A matching M on G is alsubsetlof the |edges| of G such that each [vertex] 
in G is{incident) with no more than one edge in M. 


It is easy to find a matching on a graph; for example, the empty set|will always be a matching. 
Typically, the most interesting matchings are maximal matchings. A maximal matching 
on a graph G is simply a matching of the largest possible [size] 
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62.6 maximal bipartite matching algorithm 


The maximal bipartite matching algorithm is similar some ways to the Ford-Fulkerson algo- 
rithm for network flow, This is not a coincidence; network flows and [matchings] are closely 
related. This algorithm, however, avoids some of the overhead associated with finding net- 
work flow. 


The basic idea behind this algorithm is as follows: 


1. Start with some (not necessarily maximal) matching M. 


2. Finda that alternates with an ledgeļ e1 € M, followed by an edge e2 M, and so 
on, ending with some edge ef € M. 


3. For each edge e in the path, add e to M if e ¢ M or remove e from M if e € M. Note 
that this must increase |M] by 1. 


4. Repeat until we can no longer augment the matching in this manner. 


The algorithm employs a clever labeling trick to find these paths and to ensure that the set 
of edges chosen remains a valid matching. 


The algorithm as described here uses the [matrix] form of a (bipartite graph, Translating the 
matching from a matrix to a is straightforward. 


There are two phases to this algorithm: labeling and flipping. 


Labeling We begin with a matrix with R rowsiand C [columns] containing Os, 1s, and 1xs, 
where a 1* indicates in edge in the matching and a 1 indicates an edge not in the matching. 
Number the columns 1...C and number the rows 1... R. 


Start by labeling each column that {contains|no 1*s with the symbol #. 


Now we scan the columns. Scan each column 7 that has been labelled but not scanned. Find 
each 1 in column 7 that is in an unlabelled row; label this row i. Mark column 7 as scanned. 


Next, we scan the rows. Scan each row 7 that has been labelled but not scanned. Find the 
first 1x in row j. Label the column in which it appears j, and mark row j as scanned. If 
there is no 1* in row j, proceed to the flipping phase. 


Otherwise, go back to row scanning. Continue scanning and labelling until there are no 
labelled, unscanned rows or columns; at that point, the set of 1»s is a 
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Flipping We enter the flip phase when we scan some row j that contains no 1x. This row 
must have some label c, and in column c, row j of the matrix, there must be a 1; change 
this to a 1x. 


Now consider column c; it has some label r. If r is #, then stop; go back to the labelling 
phase. Otherwise, change the 1* at column c, row r to a 1. 


Move on to row r and continue the process. 


Notes The algorithm must begin with some matching; we may begin with the empty set] 
(or a single edge), since that is always a matching. However, each through the 
process increases the |size| of the matching by exactly one. Therefore, we can make a [simple] 
optimization by starting with a larger matching. A naïve greedy algorithm can quickly 
choose a valid matching that is usually close to the size of the maximal matching; we may 
initalize our matrix with that matching to give the procedure a head start. 
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62.7 maximal matching/minimal edge covering theo- 
rem 


Theorem Let G bea If Misa on G, and C is an for G, 
then |M] < |C]. 


Proof Consider an arbitrary matching M on G and an arbitrary edge covering C' on G. 
We will attempt to construct alone-to-onel function) f : M —> C. 


Consider some edgeje € M. At least one of the vertices] that e{joins|must be in C, because C 
is an edge covering and hence every edge is|incident) with some |vertex) in C. Call this vertex 
Ve, and let f(e) = ve. 


Now we will show that f one-to-one. Suppose we have two edges e1, e2 E€ M where f(e,) = 
f(e2) = v. By the definition of f, e; and eg must both be incident with v. Since M is a 
matching, however, no more than one edge in M can be incident with any given vertex in 
G. Therefore e; = e2, so f is one-to-one. 


Hence we now have that |M] < |C]. 


Corollary Let G be a graph. Let M and C be a matching and an edge covering on G, 
respectively. If |M| = |C], then M is amaximal matching and C is a minimal edge covering 
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Proof Suppose M is not a maximal matching. Then, by definition, there exists another 
matching M’ where |M| < |M’|. But then |M’| > |C|, which violates the above theorem. 


Likewise, suppose Č is not a minimal edge covering. Then, by definition, there exists another 
covering C” where |C”| < |C|. But then |C’| < |M], which violates the above theorem. 
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Chapter 63 


05C75 — Structural characterization of 
types of graphs 


63.1 multigraph 


A multigraph is a in which we allow more than one [edge] to [join] a pair of lvertices] 
Two or more edges that join a pair of vertices are called parallel edges. Every graph, then, 
is a multigraph, but not all multigraphs are graphs. 
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63.2 pseudograph 


A pseudograph is a \graphj that allows both and |loops, 
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Chapter 64 


05C80 — Random graphs 


64.1 examples of probabilistic proofs 


The first example is the existence of k-paradoxical tournaments) The proof hinges upon the 
following basic probabilistic for any events] A and B, 


P (AUB) < P(A) + P(B) 


Theorem 5. For every k, there exists a tournament T (usually very large) such that T is 
k-paradoxical. 


L et n = |T| be the number of vertices] of T, where n > k. We will show that for n large 
enough, a k-paradoxical tournament must exist. The probability space in question is all 


possible directions of the arrows of T, where each arrow can point in either direction with 
probability 1/2, independently of any other arrow. 


We say that a set K of k vertices is arrowed by a [vertex] vp outside the set if every arrow 
between vo to w; € K points from vp to w;, for i = 1, ...,k. Consider a fixed set K of k 
vertices and a fixed vertex vo outside K. Thus, there are k arrows from vp to K, and only 
one arrangement of these arrows permits K to be arrowed by vo, thus 


1 
P(K is arrowed by vo) = —. 


Dk 
The complementary event, is therefore, 
1 


P(K is not arrowed by vo) = 1—- Te 
By and because there are n — k vertices outside of K, 


1 n—k 
P(K is not arrowed by any vertex) = (1 — x) (64.1.1) 
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Lastly, since there are (%) sets of k in T, we employ the inequality mentioned 
above to obtain that for the [union] of all events of the form in equation (64.1.1) 


n—-k 
n 1 
P(Some set of k vertices is not arrowed by any vertex) < ( o) (1 — x) ; 
If the probability of this last event is less than 1 for some n, then there must exist a k- 
paradoxical tournament of n vertices. Indeed there is such an n, since 


(;,) T = Zna -1) (n-k +1) T 


Therefore, regarding k as fixed while n tends to [infinity] the right-hand-side above tends to 
zero. In particular, for some n it is less than 1, and the result follows. 
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64.2 probabilistic method 


The probabilistic method was pioneered by Erdös Pál (known to Westerners as Paul 
Erdös) and initially used for solving problems in graph theory, but has acquired ever wider 
applications. Broadly, the probabilistic method is somewhat the opposite| of extremal graph 
theory. Instead of considering how a can behave in the most extreme case, we 
consider how a of graphs behave ”on whereby we can formulate a 
The fruits reaped by this method are often raw existence theorems, usu- 
ally deduced from the fact that the nonexistence of whatever sort of graph would mean a zero 
probability. For instance, by means of the probabilistic method, Erdos proved the existence 
of a graph of arbitrarily high and a very counterintuitive result. 
Graphs tend to get enormous as the chromatic number and girth) increase, thereby severely 
hindering necessary computations to explicitly construct them, so an existence theorem is 
most welcome. 


In all honesty, probabilistic proofs are nothing more than counting proofs in disguise, since 
determining the probabilities of interest will invariably involve detailed counting arguments. 
In fact, we could remove from any probabilistic proof any mention of a probability space, 
although the result may be significanltly less transparent. Also, the advantage of using 
probability is that we can employ all the machinery of probability 
probabilistic inequalities, and many other results, all become the 
tools of the trade in dealing with seemingly static objects of combinatorics and number 
theory. 
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Chapter 65 


05C90 — Applications 


65.1 Hasse diagram 


If (A, R) is a finite [poset] then it can be represented by a Hasse diagram, where a line is 
drawn from z € A up to y € A if 


e cRy 


e There is no z € A such that Rz and zRy. (There are no in-between elements) 


Since we are always drawing from lower to higher elements, we do not need to direct any 


Example: If A = P({1,2,3}), the {power set] of {1, 2,3}, and R is the|subset! relation C, then 


we can draw the following: 


{1,2,3} 


IN 


{1,2} {1, 3} {2, 3} 


{1} {2} {3} 


ate 


) 


Even though {3}R{1, 2,3} (since {3} C {1,2,3}), we do not draw a line directly between 
them because there are inbetween elements: {2,3} and {1,3}. However, there still remains 


an indirect [path] from {3} to {1, 2, 3}. 
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Chapter 66 


05C99 — Miscellaneous 


66.1 Euler’s polyhedron theorem 
If a [connected] plane graph] G has n [vertices] m [edges] and f faces, then 
n-m+f=2. 
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66.2 Poincaré formula 


The Poincaré formula is a generalization of Euler’s polyhedron theorem for polyhedrons of 
higher genus, If a polyhedron has n vertices, m [edges] f faces and genus g the equation 


n—=m + f= x(g), 


where 
x(g) = 2- 2g 


is known as the Euler characteristic. 
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66.3 Turan’s theorem 


A having n [vertices| which \contains] no p4clique with p > 2, has at most 
1 2 
p—-1/ 2 
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66.4 Wagner’s theorem 


A is if and only if it [contains] neither K; nor K33 as a where K; is 
the complete graph] of [order 5 and K3,3 is the complete bipartite graph) of order 6. This is 
to Kuratowski’s theorem 
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66.5 block 
A|subgraph) B of a graph|G is a block of G if either it is a/bridge| (together with the 
incident] with the bridge) or else it is a [maximal] 2-connected subgraph of G. 


Any two blocks of a graph G have at most one vertex in common. Also, every vertex 
belonging to at least two blocks is a|cutvertex| of G, and, conversely, every cutvertex belongs 
to at least two blocks. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.6 bridge 

A bridge of a|graph|G is an |edge| whose deletion increases the number of of G. 
The [vertex] analogue of a bridge is a 
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66.7 complete graph 


The complete graph with n vertices, denoted K,„, contains all possible {edges} that is, any 


two vertices are adjacent 
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The complete graph of 4 vertices, or K4 looks like this: 


The number of edges in K,, is the n — 1th|triangular number} Every vertex in Ky, has |degree] 


n — 1; therefore K, has an Euler circuit if and only if n is odd. A complete graph always 


has a Hamiltonian path| and the [chromatic numberl of K,, is always n. 
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66.8 degree (of a vertex) 


The |degree] of a{yertex| x is d(x) = |[(x)|, where F(x) is the neighborhood] of x. If we want 
to emphasize that the underlying is G, then we write [g(x) and dg(x); this notation 
can be intuitively extended to many other graph theoretic [functions] 


The minimal degree of the vertices of a graph G is denoted by 6(G) and the maximal 
degree by A(G). A vertex of degree 0 is said to be an|isolated| vertex. If 6(G) = A(G) = k, 
that is, every vertex has degree k, then G is said to be k-regular or regular of degree k. 
A graph is regular if it is k-regular for some k. A 3-regular graph is said to be cubic. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.9 distance (in a graph) 


Given x and y, their distance d(x,y) is the minimal length of an x — y If 
there is no x — y path then d(x, y) = oo. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.10 edge-contraction 


Given anledge zy of a G, the graph G/zy is obtained from G by contracting the edge 
ay; that is, to get G/xy we identify the [verticeslx and y and remove all {loops| and duplicate 


421 


edges. A graph G” obtained by a {sequence of edge-contractions is said to be a contraction 
of G. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.11 graph 


A graph G is an [ordered pair’ of disjoint] sets (V, Æ) such that Æ is a|subset] of the set V 
of unordered pairs of V. V and E are always assumed to be [finite, unless explicitly stated 
otherwise. The set V is the set of vertices (sometimes called nodes) and Æ is the set of 
edges. If G is a graph, then V = V(G) is the vertex set of G, and E = E(G) is the edge 
set. Typically, V(G) is defined to be nonempty. If x is a vertex of G, we sometimes write 
x € G instead of x € V(G). 


An edge {x,y} is said to join the vertices x and y and is denoted by ry. This xy and yx are 
the vertices x and y are the endvertices of this edge. If ry € E(G), then x and 
y are adjacent, or neighboring, vertices of G, and the vertices x and y are incident with 
the edge xy. Two edges are adjacent if they have exactly one common endvertex. Also, 
x ~ y means that the vertex x is adjacent to the vertex y. 


Notice that the definition allows pairs of the form {x,x}, which would correspond to a node 
joining to itself. 


Some graphs. 


If, on a given graph, there is at most one edge joining each pair of nodes, we say that the 


graph is 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.12 graph minor theorem 


If (G;,)een is an linfinite sequence] of then there exist two numbers m < n such 
that Gm is to a\minor|of Gn. 
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This theorem (proven by Robertson and Seymour) is often referred to as the deepest result 
in graph theory. It resolves Wagner’s conjecture in the affirmative and leads to an important 
generalization of 


Specifically, to every set 9 of finite graphs which is closed under taking minors (meaning if 
G € 9 and H is isomorphic to a minor of G, then H € S) there exist finitely many graphs 
G,...,G, such that 9 consists precisely of those finite graphs that do not have a minor 
isomorphic to one of the G;. The graphs Gj,...,G, are often referred to as the forbidden 
minors for the class S. 
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66.13 graph theory 


Graph theory is the branch of mathematics that concerns itself with [graphs] 


The concept of graphs is extraordinarily simple, which explains their wide applicability. It 
is usually agreed upon that graph theory proper was born in 1736, when Euler formalized 
the now-famous bridges of Königsberg} problem. Graph theory has now grown to touch 
almost every mathematical discipline, in one form or another, and it likewise borrows from 
elsewhere tools for its own problems. Anyone who delves into the topic will quickly see that 
the lifeblood of graph theory is the abundancy of tricky questions and clever answers. There 
are, of course, general results that systematize the subject, but we also find an emphasis on 
the solutions of substantial problems over building machinery for its own sake. 


For quite long graph theory was regarded as a branch of topology concerned with 1-dimensional 
simplices, however this view has faded away. The only remainder of the topological past is 
the topological graph theory, a branch of graph theory that primarily deals with drawing of 
graphs on surfaces. The most famous achievement of topological graph theory is the proof 


of the four-color conjecture] (every political mapjon the surface of a plane or a/sphere can be 
colored into four colors given that each country consists of only one piece). 


Now, a graph is usually thought of as a [subset] of pairs of elements of a finite set 
(called |vertices), or more generally as a family of arbitrary sets in the case of hypergraphs) For 
instance, Ramsey as applied to graph theory deals with determining how disordered 
can graphs be. The central result here is the ‘Ramsey’s theorem] which states that one can 
always find many vertices that are either all or from each other, 
given that the graph is sufficiently large. The other result is Szemerédi 


The four-color conjecture mentioned above is one of the problems in graph coloring. There 
are many ways one can color a graph, but the most common are vertex and 
coloring. In these |type] of colorings, one colors vertices (edges) of a graph so that no two 
vertices of the same color are (resp. no edges of the same color share a common 
vertex). The most common problem is to find a coloring using the fewest number of colors 
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possible. Such problems often arise in scheduling problems. 


Graph theory benefits greatly from interaction with other fieldslof mathematics. For example, 


have become the standard tool in the arsenal of graph theorists, and 
‘random graphj theory has grown into a full-fledged branch of its own. 
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66.14 homeomorphism 


We say that a [graph] G is to graph H if the R(G) of G is 
topologically |homeomorphic|to R(H) or, equivalently, G and H have isomorphic|subdivisions) 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.15 loop 


In [graph theory] a loop is an [edge] which joins alvertexlto itself, rather than to some other 
vertex. By definition, a \graph) cannot contain a loop; a pseudograph| however, may contain 
both multiple edges and multiple loops. Note that by some definitions, a |multigraph| may 
contain multiple edges and no loops, while other texts define a multigraph as a graph allowing 
multiple edges and multiple loops. 


In a loop is a which [contains] an lidentity element 
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66.16 minor (of a graph) 


A(graph) H is a minor of G, written G > H or H < G, if it is alsubgraph of a graph obtained 
from G by a|sequence| of jedge-contractions 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 


Version: 1 Owner: digitalis Author(s): digitalis 


424 


66.17 neighborhood (of a vertex) 


For a |graph]| G, the set of to a vertex x € G, the neighborhood of x, is 
denoted by F(x). Occasionally one calls r(x) the (open) neighborhood of x, and F |J{x} the 
lclosed neighborhood of x. 


Adapted with permission of the author from Modern by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.18 null graph 


Figure 1: Null Graph. 


A with zero |vertices|and [edges] One possible representation of the null graph is shown 


above. 


The null graph is the in the \category| of graphs. 
Further Reading 


e ‘Is the null graph a pointless concept’, by Frank Harary and Ronald Read 
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66.19 order (of a graph) 


The order of a [graph] G is the number of vertices) in G; it is denoted by |G|. The same 
notation is used for the number of elements of a set. Thus, |G| = |V(G)|. We 
write G” for an arbitrary graph of order n. Similarly, G(n,m) denotes an arbitrary 
graph of order n and m. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 


Version: 4 Owner: digitalis Author(s): digitalis 


425 


66.20 proof of Euler’s polyhedron theorem 


This proof is not one of the standard proofs given to I found the idea 
presented in one of Coxeter’s books. It presents a different approach to the|formula\ that may 
be more familiar to modern students who have been exposed to a “Discrete Mathematics” 
course. It falls into the [category] of “informal” proofs: proofs which assume without proof 


certain of usually proved with algebraic) topo ogy, This one makes 
deep (but somewhat hidden) use of the Jordan curve theorem 


Let G = (V, E) be a planar graph; we consider some particular planar embedding) of G. Let 
F be the set of faces of this embedding. Also let G” = (F, E’) be the dual|graph (£" \contains| 
an ledge] between any 2 [adjacent] faces of G). The planar embeddings of G and G’ determine 
a correspondence between E and E’: two lyertices) of G are adjacent iff they both belong to 
a pair of adjacent faces of G; denote by ¢: E — E” this correspondence. 


In all illustrations, we represent) a planar graph G, and the two sets of edges T C E (in red) and 
T’ C E’ (in blue). 


Let TC E bea of G. Let T’ = E’ \ ¢[E]. We claim that T” is a spanning tree 
of G’. Indeed, 


T’ contains no 
Given any loop of edges in T’, we may draw a loop on the faces of G which participate 
in the loop. This loop must the vertices of G into two non-empty sets, and 


only crosses edges of Æ \T. Thus, (V, T) has more than a single 
so T is not [The proof of this utilizes the Jordan curve theorem.] 


T' spans] G'. 
For suppose 7” does not connect all faces F. Let f1, fo € F be two faces with no |path| 


between them in 7”. Then T must contain a [cycle] separating fi from fo, and cannot 
be altree| [The proof of this utilizes the Jordan curve theorem.] 


We thus have a partition E = TU ¢'[T"] of the edges of G into two sets. Recall that in 
any tree, the number of edges is one less than the number of It follows that 


|Z] = |T| + |T"| = (V| -1) + (Fl -1) = |V| + |F| -2, 
as required. 
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66.21 proof of Turan’s theorem 


If the G has n < p — 1 \vertices| it cannot [contain] any p-clique and thus has at most 
(5) So in this case we only have to prove that 


— 2 
n(n 1) g i- 1 n 
2 p-1)2 


This of course is easy, we get 


This is true since n < p—1. 


So now we assume that n > p and the set of vertices of G is denoted by V. If G has the 
maximum number of edges possible without containing a p—clique it contains a p — 1-clique, 
since otherwise we might add edges to get one. So we denote one such |clique] by A and define 
B:=G\A. 


So A has (? aa) edges. We are now interested in the number of edges in B, which we will call 
eg, and in the number of edges connecting A and B, which will be called e4 . By [induction] 
we get: 


Since G does not contain any p-clique every vertice of B is to at most p — 2 
vertices in A and thus we get: 


ea,B S (p — 2)(n - p + 1). 
Putting this together we get for the number of edges | E| of G: 


|E| < COIE (n—p+1)? p= 


2 
1 2 
i< (1-5) a 
p—1;/ 2 
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And thus we get: 


66.22 realization 


Let pı, p2,... be distinct points in R, the 3-dimensional Euclidean space, such that every 
plane in R? \contains| at most 3 of these points. Write (p;,p;) for the straight line 
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with endpoints p; and p; (Open) or closed, as you like). Given a [graph] G = (V, E), V = 
(£1, £2,- -, Zn), the topological space 


3 


A 


R(G) = Jf (vi, p;): rit; E€ E} UU} C 


is said to be a realization of G. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobás, 
published by Springer-Verlag New York, Inc., 1998. 
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66.23 size (of a graph) 
The size of a|graph|G is the number of|edges|in G; it is denoted by e(G). G(n, m) denotes 
an arbitrary graph of lorder|n and size m. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.24 subdivision 


A H is said to be a subdivision of a graph G, or a topological G graph if H is 
obtained from G by subdividing some of the |edges| that is, by replacing the edges by [paths] 
having at most their endvertices in common. We often use TG for a topological G graph. 


Thus, TG denotes any member of a large family of graphs; for example, TC, is an arbitrary 


icycle|of length at least 4. For any graph G, the spaces R(G) (denoting the {realization of G) 
and (TG) are omeomarphid 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.25 subgraph 


We say that G’ = (V’, E’) is a subgraph of G = (V, E) if V' C V and E’ C E. In this case 
we write G’ C G. 


If G’ \contains| all |edges| of G that two lvertices|in V’ then G’ is said to be the subgraph 
induced or spanned by V’ and is denoted by G[V’]. Thus, a subgraph G” of G is an 
induced subgraph if G’ = G[V(G’)]. If V’ = V, then G’ is said to be a spanning subgraph 
of G. 


Often, new are constructed from old ones by deleting or adding some vertices and 
edges. If W C V(G), then G — W = G[VW] is the subgraph of G obtained by deleting 
the vertices in W and all edges with them. Similarly, if E’ C E(G), then 
G — E’ = (V(G), E(G) \ E’). If W = w and E’ = zy, then this notation is simplified to 
G — w and G — xy. Similarly, if x and y are nonadjacent vertices of G, then G + xy is 
obtained from G by joining x to y. 


Adapted with permission of the author from Modern Graph Theory by Béla Bollobas, 
published by Springer-Verlag New York, Inc., 1998. 
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66.26 wheel graph 


The wheel graph of n [vertices] W,, is a that [contains] a [cycle] of length n — 1 plus a 
vertex v (sometimes called the hub) not in the cycle such that v is connected to every other 
vertex. The ledges connecting v to the rest of the graph are sometimes called spokes. 


\ 
A, 


W3: 
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Chapter 67 


05D05 — Extremal set theory 


67.1 LYM inequality 


Let F be a Sperner family, that is, the of |subsets| of {1,2,...,n} such that no set 
lcontains) any other subset. Then 
1 
2 TS} 


XEF \|X| 


This is known as LYM inequality by the mamesl of three people that indepen- 
dently discovered it: Lubell[2], Yamamoto[4], Meshalkin [8]. 


Since (7) < (,,",)) for every [integer] k, LYM inequality tells us that |F] / (n/a ) < 1 which is 
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67.2 Sperner’s theorem 


What is the [size] of the largest family F of subsets! of an n-element set such that no A € F 
is a subset of B € F? Sperner [8] gave an answer in the following elegant theorem: 


Theorem 7. For every family F of incomparable subsets of an n-set, |F| < (ion) 


A family satisfying the conditions of Sperner’s theorem is usually called Sperner family or 
The later terminology stems from the fact that subsets of a finite set ordered by 
linclusion| form a [Boolean lattice 


There are many generalizations of Sperner’s theorem. On one hand, there are refinements 


like LYM inequality that strengthen the theorem in various ways. On the other hand, there 
are generalizations to |posets| other than the Boolean lattice. For a comprehensive exposition 
of the topic one should consult a well-written monograph by Engel[2]. 
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Chapter 68 


05D10 — Ramsey theory 


68.1 Erdos-Rado theorem 
Repeated exponentiation for cardinals] is denoted exp,(«), where i < w. It is defined by: 
expl) = « 
and 
exPi (1) = AR 
The Erdös-Rado theorem that: 
exp;(K)* > (KJR 


That is, if f : [exp,(«)*]'*! — « then there is a [homogenous] set of|sizel «*. 


As special cases, (2")* — (K+)? and (2%°)+ > (Nik: 
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68.2 Ramsey’s theorem 


Ramsey’s theorem that a particular 
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w> (w)p 


for any integers| n and k. 


In words, if f is a function] on sets of integers of [size] n whose range) is lfinite| then there is 
some [infinite X C w such that f is constant! on the |subsets) of X of size n. 


As an example, suppose f : [w]? — {0,1} is defined by: 


1 if =y or y= xr? 
fz yh) = l 0 otherwise 
Then let X C w be the set of integers which are not |perfect|squares} This is [clearly] infinite, 
and obviously if x,y € X then neither x = y? nor y = x”, so f is|homogenous|on X. 
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68.3 Ramsey’s theorem 


The original version of Ramsey’s theorem ‘states| that for every positive kı and 
kə there is n such that if ledges] of a [complete graph| on n {vertices| are colored in two 


then there is either a k,-clique of the first color or k2-clique of the second color. 


The standard proof proceeds by induction on kı + kə. If kı, k2 < 2, then the theorem holds 
trivially. To prove induction step we consider the G that [contains] no cliques of the 
desired kind, and then consider any \vertex| v of G. [partition] the the rest of the vertices into 
two Cı and Cy according to whether edges from v are of color 1 or 2 respectively. 
By inductive hypothesis |C] is bounded) since it contains no kı — 1-clique of color 1 and no 
kə-clique of color 2. Similarly, |C2| is bounded. 


Similar argument shows that for any positive integers k1, k,..., k+ if we color the edges of a 
sufficiently large graph in t colors, then we would be able to find either k,-clique of color 1, 
or kə-clique or color 2,..., or k,-clique of color t. 


The minimal n whose existence stated in Ramsey’s theorem is called Ramsey number 
and denoted by R(kı, k2) (and R(ky, k2,..., k+) for multicolored graphs). The above proof 
shows that R(k1, k2) < R(kı, k2 — 1) + R(kı — 1, k2). From that it is not hard to deduce by 
induction that R(kı, k2) < a on g? In the most interesting case kı = ky = k this yields 
approximately R(k,k) < (4 + 0(1))*. The [lower bounds) can be established by means of 
probabilistic construction as follows. 


Take a complete graph of |size!n and color its edges at random, choosing the color of each 
edge uniformly independently of all other edges. The probability that any given set of k 
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vertices is a monochromatic clique is 2'~*. Let I, be the which is 1 if 
r’th set of k elements is monochromatic clique and is 0 otherwise. The sum /,’s over all 
k-element sets is simply the number of monochromatic k-cliques. Therefore by linearity of 
[expectation] E(X 7) = >> E(I,) = 2'-*("). If the expectation is less than 1, then there 
exists a which has no monochromatic cliques. A little exercise in calculus shows 
that if we choose n to be more than (2 + 0(1))*/* then the expectation is indeed less than 1. 
Hence, R(k, k) > (V2 + o(1))*. 


The gap between the lower and upper bounds, has been for several decades. There 
have been a number of improvements in o(1) but nothing better than v2 + o(1) < 


R(k,k)/* < 4+0(1) is known. It is not even known whether limp... R(k, k)'/* exists. 


The behavior of R(k, x) for {fixed|k and large x is equally mysterious. For this case Ajtai, 
Komlós and Szemerédi{I] proved that R(k, £) < cyx*~t/(In)*?. The matching] lower bound 
has only recently been established for k = 3 by Kim [4]. Even in this case the asymptotics 
is unknown. The of results of Kim and improvement of Ajtai, Komlós and 


Szemerédi’s result by Shearer [6] yields 45(1 + 0(1))k?/logk < R(3, k) < (1+0(1))k?/ log k. 


A lot of machine and human time has been spent trying to determine Ramsey numbers for 
small kı and kə. An up-to-date summary of our knowledge about small Ramsey numbers 
can be found in [5]. 
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68.4 arrows 


Let [X]* = {Y C X | |Y| = a}, that is, the set of subsets! of X of a. Then given some 
cardinals|«, A, œ and 3 


435 


r => (A) 


(States) that for any set X of size x and any lfunction] f : [X]* — 8, there is some Y C X and 
some y € 8 such that |Y | = à and for any y € [Y]°, f(y) =7. 


In words, if f is a of [X|* into 8 subsets then f is [constant] on a subset of size A (a 
homogenous subset). 


As an example, the pigeonhole principle) is the statement that if n is [finiteland k < n then: 


n> 2 


That is, if you try to partition n into fewer than n pieces than one piece has more than one 
element. 


Observe that if 
k => (A) 
then the same statement holds if: 


e « is made larger (since the restriction of f to a set of size x can be considered) 


A is made smaller (since a subset of the homogenous set will suffice) 


e 8 is made smaller (since any partition into fewer than 8 pieces can be expanded by 


adding empty sets| to the partition) 


e a is made smaller (since a partition f of [kK]? where y < a can be extended to a 
partition f’ of [K]° by F'(X) = f(X,) where X, is the y smallest elements of X) 
K = (A)5 


is used to state that the corresponding — [relationlis false. 
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68.5 coloring 


A coloring of a set X by Y is just a [function] f : X — Y. The terml coloring is used because 
the function can be thought of as assigning a “color! from Y to each element of X. 
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Any coloring provides a of X: for each y € Y, f~'(y), the set of elements x such 
that f(x) = y, is one element of the partition. Since f is a function, the sets in the partition 


are disjoint, and since it is a total function, their union is X. 
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68.6 proof of Ramsey’s theorem 


w > (w) 


is proven by induction on n. 


If n = 1 then this just that any of an into a finite! number of 
lsubsets) must include an infinite set; that is, the of a finite number of finite sets is 
finite. This is enough to prove: since there are a finite number of sets, there is a 
largest set of x. Let the number of sets be y. Then the size of the union is no more than 


Ty. 


If 
w > (w)k 
then we can show that 
w > (w) 


Let f be some [coloring] of [S|"* by k where 5S is an infinite subset of w. Observe that, given 
and x < w, we can define f” : [S \ {x}]" — k by f*(X) = f({xz} UX. Since S is infinite, by 
the induction hypothesis this will have an infinite [homogenous] set. 

Then we define a [sequence] of (ni)icw and a sequence of infinite subsets of w, (S;)icu 
by induction. Let no = 0 and let So = w. Given n; and S; for i < j we can define S; as an 


infinite homogenous set for f™ : [S;_1]" > k and n; as the least element] of S}. 


[Obviously] N = U{n;} is infinite, and it is also homogenous, since each n; is contained in S; 
for each j <S i. 
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Chapter 69 


05D15 — Transversal (matching) 
theory 


69.1 MHall’s marriage theorem 


Let S = {S1,So,...S,}be alfinitelcollection| of finite sets. There exists alsystem of distinct representatives 


of S if and only if the following condition holds for any T C S: 
Uz] > 17 


As a corollary, if this condition fails to hold anywhere, then no SDR exists. 


This is known as Hall’s marriage theorem. The mamal arises from a particular application of 
this theorem. Suppose we have a finite set of single men/women, and, for each man/woman, 
a finite collection of women/men to whom this person is attracted. An SDR for this collection 
would be a way each man/woman could be (theoretically) married happily. Hence, Hall’s 
marriage theorem can be used to determine if this is possible. 


An application of this theorem to graph theory gives that if G(V,, V2, E) is a\bipartite graph 
then G has a complete matching that |saturates| every vertex) of V; if and only if |S] < |N(S)| 
for every [subset] S C Vj. 
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69.2 proof of Hall’s marriage theorem 


We prove Hall’s marriage theorem) by \induction| on |S|, the |sizel of S. 


The theorem is trivially true for |S| = 0. 
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Assuming the theorem true for all |S| < n, we prove it for |S| = n. 
First suppose that we have the stronger condition 
U T| siisi 


for all Ø AT C S. Pick any x € S, as the representative of S,,; we must choose an [SDR] 
from 


= = {x5}, wd a ,XSn-1} è 


But if 
{xS;, a XS} = T' = S” 


then, by our assumption, 


-1 >k. 


2 


U p U Sj, 
i=1 


By the already-proven case of the theorem for S’ we see that we can indeed pick an SDR for 
S. 


Otherwise, for some Ø Æ T C S we have the “exact” size 


UT = el: 


Inside T itself, for any T’ C T C S we have 


Urjar, 


so by an already-proven case of the theorem we can pick an SDR for T. 


It remains to pick an SDR for ST which avoids all elements of JT (these elements are in 
the SDR for T). To use the already-proven case of the theorem (again) and do this, we 
must show that for any T’ C ST, even after discarding elements of LJ T there remain enough 
elements in [J T”: we must prove 


Ur \Uz| > |T]. 
But 


UPAUZ| = UTUI- UTI > (69.2.1) 
2 TUT=]s= (69.2.2) 
= |T] + |T|- IT| = IT, (69.2.3) 


using the disjointness of T and T’. So by an already-proven case of the theorem, SJ does 
indeed have an SDR which avoids all elements of UT. 


[QED] 
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69.3 saturate 


Let G(V, E) bea and M a\matching|in G. A[vertexlv € V(G) is said to be saturated 
by M if there is an edge in M (incident) to v. A vertex v € V(G) with no such edge is said 
to be unsaturated by M. We also say that M saturates v. 
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69.4 system of distinct representatives 
Let S = {S1, S2, ... Sn} be a finitel|collection| of finite sets. A system of distinct repre- 
sentatives, or SDR, of S is a set 

Lye S1, T2 E So, ... En € S 


such that 
xi Z £; whenever i Æ j 


(i.e., each choice must be unique). 
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Chapter 70 


05E05 — Symmetric functions 


70.1 elementary symmetric polynomial 


The coefficient of z”~* in the polynomial] (x + t,)(z + t2): +- (£ + tn) is called the k-th 
elementary symmetric polynomial in the n variables t,,..., tn- 


The first few examples are: 


en=l 
=f 
en=2 
— ty + to 
— tbe 
en=3 
— tı + t2 + t3 
tit + tot3 + tıt3 
— tytots 
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70.2 reduction algorithm for symmetric polynomials 


We give here an algorithm for reducing a symmetric polynomial into a polynomial] in the 
elementary symmetric polynomials 


441 


We define the height of almonomiall ri --- 2°" in Ri[x1,...,%p] to be ey +2e2 +: --+nen. The 
eight of a polynomial is defined to be the maximum height of any of its monomial 


or 0 if it is the zero polynomial. 


Let f be a symmetric polynomial. We reduce f into elementary symmetric polynomials by 
linduction|on the height of f. Let cxf ---x% be the monomial term of {maximal height in f. 
Consider the polynomial 


a €n—En-1 en—-1—€n—-2 e2—-e€1 „ei 
g:= f- csi S2 t Sni Sn 


where s; is the k-th elementary symmetric polynomial in the n variables 2,...,2,. Then 
g is a symmetric polynomial of lower height than f, so by the induction hypothesis, g is a 
polynomial in s1,..., Sn, and it follows immediately that f is also a polynomial in s1,..., Sn- 
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Chapter 71 


06-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


71.1 equivalence relation 


An equivalence relation ~ on a set S is a relation that is: 


Reflexive. a ~a for alla € S. 
Symmetric. Whenever a ~ b, then b ~ a. 
Transitive. If a~ b and b ~ c then a ~c. 


If a and b are related this way we say that they are equivalent under ~. If a € S, then the 
set of all elements of S that are equivalent to a is called the equivalence class of a. 


An equivalence relation on a set induces a partition on it, and also any partition induces 
an equivalence relation. Equivalence relations are important, because often the set S can 


be ’transformed’ into another set (quotient space) by considering each equivalence class as 
a single 


Two examples of equivalence relations: 


1. Consider the set of 1. Z and take a positive integer m. Then m induces an 
equivalence relation by a ~ b when 1. m [divides] 1. b — a (that is, a and b leave the same 
remainder when divided by m). 


2. Take a|proup](G,-) and a 2. 2. H. Define a ~ b whenever ab“! € H. That 
defines an equivalence relation. Here 2. equivalence classes are called 2. 
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Chapter 72 


06-X X — Order, lattices, ordered 
algebraic structures 


72.1 join 


Certain{posets|X have a\binary operator|join, denoted V such that xVy is the lowest upper bound 


of x and y. Further, if j and 7’ are both joins of x and y, then 7 < 7’ and 7’ < j, and so 
j =’; thus a join, if it exists, is unique. Also known as the or operator. 
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72.2 meet 


Certain|posets|.X have a\binary|operator|meet, denoted A such that Ay is the greatest lower bound 


of x and y. Further, if m and m’ are both meets of x and y, then m < m’ and m > m’, and 
so m =m’; thus a meet, if it exists, is unique. 
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Chapter 73 


O6A06 — Partial order, general 


73.1 directed set 


A directed set is a partially ordered set (A, <) such that whenever a,b € A there is c € A 


such that a < c and b <S c. 


A [subset] B C A is said to be there is a € A such that b € B whenever a < b, 
and cofinal iff for each a € A there is b € B such that a < b. 


Note: Many authors do not require < to be|antisymmetric, so that (A, <) is only a suborder 
with the above property 
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73.2 infimum 


The infimum of a set S is the greatest lower bound] of S and is denoted inf(S). 


Let A be a set with a partial order] <, and let S C A. For any x € A, x is a [lower bound] 
of S if x < y for any y € S. The infimum of S, denothed inf(S), is the greatest such lower 


bound; that is, if b is a lower bound of S, then b < inf(S). 


Note that it is not necessarily the case that inf(S) € S. Suppose S = (0,1); then inf(S) = 0, 
but 0 ¢ S. 


Also note that a set does not necessarily have an infimum. See the attachments to this entry 
for examples. 
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73.3 sets that do not have an infimum 
Some examples for sets that do not have an infimum: 


e The set Mı := Q (as a lsubset] of Q) does not have an m (nor a supremum). 
Intuitively this is clear, as the set is{unbounded! The a) al proof is left as an 
exercise for the reader. 


e A more interesting example: The set My := {x € Q : x? > 2,x > 0} (again as a subset 
of Q). 
N ow inf(M2) > 0. Assume 7 > 0 is an infimum of M>. Now we use the fact 
that v2 is not rational) and therefore i < V2 or i > V2. 
If i < v2, choose any j € Q from the [interval] (i, V2) C R (this is a [reall interval, but 
as the are dense! in the real numbers, every nonempty interval in R 
lcontains) a rational number, hence such a j exists). 


Then j > i, but j < V2, hence j? < 2 and therefore j is aflower bound] for Mz, which 


is a contradiction. 


On the other hand, if i > V2, the argument is very similar: Choose any j € Q from the 
interval (/2,i) C R. Then j < i, but j > V2, hence j? > 2 and therefore j € Mz. Thus 
Mp contains an element j smaller than 7, which is a contradiction to the assumption 
that i = inf (M2) 


Intuitively speaking, this example exploits the fact that Q does not have “enough 


elements”. More formally, Q as a|metric spaceļis not The M defined above 
is the real interval Mj := (v2, 00) C R intersected with Q. M} as a subset of R does 


have an infimum (namely V2), but as that is not an element of Q, Mz does not have 
an infimum as a subset of Q. 


This example also makes it clear that it is important to clearly |state| the [superset] one 


is working in when using the notion of infimum or supremum. 


It also illustrates that the supremum is a natural generalization of the minimum of a 
set, as a set that does not have a minimum may still have an infimum (such as M3). 


Of course all the ideas expressed here equally apply to the supremum, as the two 
notions are completely analogous (just reverse all inequalities). 
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73.4 supremum 


The supremum of a set S is the of S and is denoted sup($). 
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Let A be a set with a|partial order) <, and let S C A. For any x € A, x is an |upper bound] 
of S if y <a for any y € S. The supremum of S' is the least such upper bound; that is, if b 


is an upper bound of S, then sup($) < b. 


Note that it is not necessarily the case that sup(S) € S. Suppose S = (0,1); then sup(S) = 1, 
but 1 ¢ S. 


Note also that a set may not have an upper bound at all. 
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73.5 upper bound 


Let S be a set with an <, and let T be alsubset|of S. An upper bound for 


T is an element z € S such that x < z for all x € T. We say that T is bounded from above 
if there exists an upper bound for T. 


Lower bound, and bounded from below are defined in a similar manner. 
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Chapter 74 


O6A9II — Miscellaneous 


74.1 dense (in a poset) 


If (P, <) is a/poset| then a\subset|Q C P is|densel if for any p € P there is some q € Q such 
that q <S p. 
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74.2 partial order 


A partial order (often simply referred to as an order or ordering) is alrelationi < C Ax A 
that [satisfies! the following three 
1. a Sa foralla c A 


2. Ifa < band b <a for any a,b € A, then a = b 
3. [transitivity] Ifa <b and b < c for any a,b,c E€ A, then a S c 


A ltotal order is a partial order that satisfies a fourth property known as comparability. 
A set and a partial order on that set define a 
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74.3 poset 
A poset is a partially ordered set, that is, a poset is a pair (P, <) where < is a partial order 
‘relation on] P. 


A few examples: 


e (Z, <) where < is the common “less than or equal” relation on the [integers] 


e (P(X), C). P(X) is the [power set] of X and the C given by the common 
linclusion| of sets. 


In a partial order, not any two elements need to be comparable. As example, consider 
X = {a,b,c} and the poset on its power set given by the inclusion. Here, {a} C {a,c} 
but the two {a,b} and {a,c} are not comparable. (Neither {a,b} C {a,c} nor 
A 
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74.4 quasi-order 
A quasi-order on a set S is alrelation| < on S satisfying the following two [axioms} 


1. s Ss for all s € S, and 
2. transitivity! If s St and t Su, then s Ș u; for all s,t,u € S. 


Given such a relation, the relation s ~ t : (s < t) A (t S s) is an [equivalence relation] on 
S, and < induces a partial order < on the set S/ ~ of of ~ defined by 


ls] S t]: s {t, 


where [s] and ft]~ denote the equivalence classes of s and t. In particular, < does 
antisymmetry, whereas < may not. 
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74.5 well quasi ordering 


A|quasi-order|(Q, <) is a well-quasi-ordering (wqo) if for every a1, A2, 03; 


from Q there exist i < j € N such that a; { aj. An infinite sequence from Q is usually 
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referred to as bad if for all i < j, a; Za; holds; otherwise it is called good. Note that an 


is [obviously] a bad sequence. 
The following gives equivalent] definitions for well-quasi-ordering: 
Proposition 3. Given a set Q and a binary relation x over Q, the following conditions are 


equivalent: 


e (Q, <) is a well-quasi-ordering; 


e (Q, x) has no infinite (w-) and no infinite antichains. 


e Every linear\ertension of Q/~ is a well-order, where ~ is the equivalence relation and 
Qj is the set oflequivalence Classeslinduced by A, 


e Any infinite (w-) Q-sequence [contains] an increasing chain. 


The equivalence] of WQO to the second and the fourth conditions is proved by the infinite 
version of Ramsey’s theorem 
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Chapter 75 


06B10 — Ideals, congruence relations 


75.1 order in an algebra 

Let A be an [algebra] finitely generated over Q. An order R of A is a subring) of A which is 
finitely generated as a Z-module and which |satisfies| R & Q = A. 

Remark: The algebra A is not necessarily commutative) 


Examples: 


1. The ring of integers in a is an order, known as the maximal order. 


2. Let K be a quadratic imaginary] field| and O its ring of integers. Then for each [integer] 
n, the [fring] Z + nO is an order of K (in fact it can be proved that every order of K is 


of this form). 


Reference: Joseph H. Silverman, The larithmetid of Springer-Verlag, New 
York, 1986. 
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Chapter 76 


06C05 — Modular lattices, 
Desarguesian lattices 


76.1 modular lattice 


Allatticel L is said to be modular if x V (y Az) = (£ Vy) Az for all x,y,z € L such that x < z. 


The following are examples of modular lattices. 


e All distributive lattices 


e The lattice of of any {group} 
e The lattice of submodules of any {module} 


A lattice L is modular if and only if it is graded and its rank p satisfies 
p(x) + p(y) = plz Ay) + p(xV y) for all x,y € L. 
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Chapter 77 


0O6D99 — Miscellaneous 


77.1 distributive 
Given alset] S with two|binary operations|+: S x S — S and-: S x S — S, we say that - is 
right distributive over + if 
(a+b)-c= (a-c)+ (b-c) forall a,b,c E€ S 
and left distributive over + if 
a-(b+c) =(a-6)+(a-c) for all a,b,c € S. 
If - is both left and right distributive over +, then it is said to be distributive over +. 
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77.2. distributive lattice 


A latticelis said to be distributive if it satisifes either (and therefore both) of the|distributive laws} 
etA(yVz)=(a@Ay)V(LAzZ) 


e rV (yAz) =(@Vy)A(V2) 


Every distributive lattice is (modular, 
Examples of distributive lattices include and sets. 
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Chapter 78 


06E99 — Miscellaneous 


78.1 Boolean ring 


A boolean ring is alring| R that has alunitlelement, and in which every element is idempotent 


In other words, 
rL =r, Vee R. 


Example of boolean ring: 
Let R be the ring Zə x Zə with the operations being coordinate-wise. Then we can check: 


the four elements that form the ring are idempotent. So, R is [Boolean] 
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Chapter 79 


08A40 — Operations, polynomials, 
primal algebras 


79.1 coefficients of a polynomial 


if p= J ;_ọ ax" is alpolynomial] its coefficients are {a;}?_p 
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Chapter 80 


O8A99 — Miscellaneous 


80.1 binary operation 


A binary operation on a set X is a function) from X x X to X. 


Rather than using function notation, it is usual to write binary operations with an opera- 
tion symbol between elements, or even with no operation at all, it being understood that 
juxtaposed elements are to be combined using an operation that should be clear from the 
context. 


Thus, addition of {real numbers! is the operation 


and multiplication in a groupoid] is the operation 
(x,y) = ay. 
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80.2 filtered algebra 


Definition 2. A filtered algebra over the k is an|algebraj (A, -) over k which is endowed 

with alfiltration| F = {F;ẹ}ien compatible with the multiplication in the following sense 
Ym, n EN, Dit Pan C Enim 

A special case of filtered algebra is a graded algebra, In general there is the following 


construction that produces a graded algebra out of a filtered algebra. 
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Definition 3. Let (A,-,F) be a filtered algebra then the associated graded algebra 9(A) 
is defined as follows: 


e As alvector space] 
SA) =G, 


nEN 


where, 
Go = Fo, and Yn >0, Gn = Fi/Fr-1, 


e the multiplication is defined by 
(£ + Fa)(y+ Fm) =£- y + Fam 


Theorem 6. The multiplication is well defined and endows G(A) with the structure of a 
graded algebra, with gradation {Gn}nen. Furthermore if A islassociativd then so is G(A). 


An example of a filtered algebra is the Clifford algebra Cliff(V, q) of a vector space V endowed 
with a [quadratic form] q. The associated graded algebra is A V, the [exterior algebra] of V. 


As algebras A and §(A) are distinct (with the exception of the trivial case that A is graded) 
but as vector spaces they are [isomorphic] 


Theorem 7. The underlying vector spaces of A and G(A) are isomorphic. 
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Chapter 81 


11-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


81.1 Euler phi-function 


for any positive n, (n) is the number of positive integers less than or equal to n 


which are|coprime|to n. This is known as the Euler ¢-function. Among its useful 
are the facts that ¢ is multiplicative, meaning if gcd(a,b) = 1, then ¢(ab) = ¢(a)¢(b), and 
that d(p*) = p*~'(p—1) if p is\prime| These two facts combined give a numeric computation 


of ¢ for all integers: 
1 
o(n) =n] ] (1 -<) l 
pin 


For example, 


#(2000) = ¢(2*-5%) 

= 2000(1 — 5)(1 — 5) 
i. Al 

= 2000(5)(F) 

8000 

~ 10, 

= 800. 

In addition, 
5 o(d) =n 
d\n 


where the sum extends over all positive \divisors] of n. Also, #(n) is the number of [units] in 
the ring] Z/nZ of integers modulo n. 
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81.2 Euler-Fermat theorem 


Given a,n € Z, a?) <= 1 (mod n) when gcd(a, n) = 1, where ¢ is the{Euler totient function 
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81.3 Fermat’s little theorem 


If a,p € Z with p a\prime| and p f a, then a?~' = 1 (mod p). 
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81.4 Fermat’s theorem proof 


Consider the sequence) a, 2a,...,(p — 1)a. 


They are all different (modulo p) because if ma = na with 1 < m < n < p—1 then 
0 = a(m — n) and since p fa we get p | (m — n) which is impossible. 


Now, since all these numbers are different, the set {a, 2a, 3a,...,(p—1)a} will have the p— 1 


possible (congruence classes| (although not necessarily in the same and therefore 
a-2a-3a---(p—1)aS (p—1)!a?"* & (p—1)! (mod p) 
and using gcd((p — 1)!, p) = 1 we get 


a”! 4&1 (mod p) 


Version: 3 Owner: drini Author(s): drini 


81.5 Goldbach’s conjecture 


The conjecture |states| that every [even integer|n > 2 is expressible as the sum of two |primes 


In 1966 Chen proved that every sufficiently large even number can be expressed as the sum 
of a prime and a number with at most two prime 
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Vinogradov proved that every sufficiently large odd number is a sum of three primes. In 
1997 it was shown by J.-M. Deshouillers, G. Effinger, H. Te Riele, and D. Zinoviev that 


assuming |generalized Riemann hypothesis! every [odd number! n > 5 can be represented as 


sum of three primes. 


The conjecture was first proposed in a 1742 letter from Christian Goldbach to Euler and 
still remains unproved. 
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81.6 Jordan’s totient function 


Let p be a [prime] k and n [natural numbers| Then 


A(n) =n* [0 —p™) 


pln 
where the product is over divisors) of n. 
This is a generalization of 
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81.7 Legendre symbol 


Legendre Symbol. 


Let p an odd|prime| The symbol (2) or (a | p) will has the value 1 if a isa 
modulo p, —1 if a is not a quadratic residue, or 0 if p divides) a. The symbol defined this 
way is called Legendre symbol. 


The Legendre symbol can be computed by means of or Gauss’ lemma\ 
A generalization of this symbol is the 
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81.8 Pythagorean triplet 


A Pythagorean triplet is a set {a,b,c} of three |integers| such that 


ety =e. 


That is, {a,b,c} is a Pythagorean triplet if there exists aright triangle whose sides are a, b, c. 


If {a,b,c} a Pythagorean triplet, so is {ka, kb, kc}. If a,b,c are |coprimes), then we say that 
the triplet is primitive. 


All the primitive Pythagorean triplets are given by 


= 2mn 
b = m-n 
c= m + n? 


where m,n are any two coprime integers, one odd and the other even with m > n. 
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81.9 Wilson’s theorem 


Wilson’s theorem [states] that 
(p—1)!<-1 (mod p) 


for prime numbers) p. 
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81.10 arithmetic mean 


Arithmetic Mean. 
If a1, d2,...,@, are real numbers, we define the arithmetic mean of them as 


ay +a +: +a 
n 


A.M. = 


The arithmetic mean is what is commonly called the average of the numbers. 
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81.11 ceiling 


The ceiling |function|is the smallest integer greater or equal than its argument. It is usually 
denoted as [z]. 


Some examples: [6.2] = 7, [0.4] = 1, [7] = 7, [—5.1] = —5, [r] = 4, [—4] = —4. 


Note that this function is NOT the ({z]), since [3.5] = 4 and [3.5] = 3. 
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81.12 computation of powers using Fermat’s little the- 
orem 


A straightforward application of consists of rewriting the of an 
integer] mod n. Suppose we have x © a? (mod n) with a € U(n). Then, by Fermat’s 
theorem we have 

a") <1 (mod n), 


SO 
a ab(1)* & ahah) & abt 9) (mod n) 


for any integer k. This means we can replace b by any integer congruent] to it mod (n). In 


particular we have 


b% b(n) 


rea (mod n) 


where b % (n) denotes the remainder of b upon division by ¢(n). 


This can be used to make the computation of large powers easier. It also allows one to find 
an easy to compute [inverse to x? (mod n) whenever b € U(n). In fact, this is just x” ’ where 
b7! is an inverse to b mod ¢(n). This forms the [basel of the RSA cryptosystem where a 
message x is encrypted by raising it to the bth power, giving x’, and is decrypted by raising 
it to the b-'th power, giving 


which, by the above argument, is just 


the original message! 
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81.13 congruences 


Let a,b be lintegers| and m a non-zero integer. We say that a is to b modulo 
m if m(divides]b — a. We write this as 


ab (mod m). 


If a and b are congruent modulo m, it means that both numbers leave the same residue] when 
divided by m. 


Congruence with a\fixed|modulo is an/equivalence relation\on Z. The set oflequivalence classes 
is a cyclic group of order m with respect to the sum and a ring if we consider multiplication 


modulo m. This ring is usually denoted as 


Z 
mZ 
This ring is also commonly denoted as Zm, although that notation is also used to [represent] 
the m-adic integers. 
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81.14 coprime 
Two [integers] a, b are coprime if their is 1. It is also said that a,b 
are relatively prime. 
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81.15 cube root 


The cube root of a real number] x, written as </z, is the real number y such that y’ = z. 
Equivalently, V/T = x. Or, YTX YTX Yee x. 


Example: ¥/—8 = —2 because (—2)? = —2 x —2 x —2 = —8. 


Example: Wx? + 322 + 3x + 1=2+1 because (x +1)? = (x + 1)(x+1)(x +1) = (x? + 224+ 
1)(x +1) = r? + 32? + 3x + 1. 


The cube root operation is[distributivel for multiplication and division, but not for addition 
and subtraction. 


3 


That is, Wr X y = Wax x $y, and $ = a 
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However, in general, Ya Fy 4 Wat vy and Ya—-—yF YT- Vy. 


Example: ¥/x3y3 = xy because (xy)? = xy x zy xX sy = £? x y? = zy’. 


. 3/83 —2 ye Fe 
Example: </>. = = because (2) = 55 = poe: 


The cube root notation is actually an alternative to exponentiation. That is, Wa & T3. As 
such, the cube root operation islassociative| with exponentiation. That is, Wx? = z3 = IT. 
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81.16 floor 


The floor is the greatest integer less or equal than its argument. It is usually 
denoted as |z]. 


Some examples: |6.2| = 6, [0.4] = 0, [7] = 7, |-5.1] = —6, |t] = 3, |-4] = —4. 


Note that this function is NOT the ({z]), since |—3.5| = —4 and [—3.5] = —3. 


However, both functions agree for non negative numbers 


On some texts however, it is sometimes seen the bracket notation to denote floor func- 
tion (although they actually work with integer part) so it is sometimes also called bracket 
function 
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81.17 geometric mean 


Geometric Mean. 


If a1, @2,...,@, are real numbers, we define their geometric mean as 
G.M. = /a1a2: -<an 


(We usually require the numbers to be non negative so the [mean always exists.) 
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81.18 googol 


A googol equals to the number 10!°°, that is, a one followed by one hundred zeros. A 


googolplex is is ten raised to the power] of a googol, i.e., 1000), 


Although these numbers do not have much use in traditional mathematics, they are useful 
for illustrating what “big” can meanlin mathematics. Written out in numbers a googol is 


10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 


This is already a huge number. For instance, it is more than the number of atoms in the 
known [universe] A googoplex is even larger. In fact, since a googolplex has a googol number 


of zeros in it’s decimal a googoplex has more digits than there are atoms 
in our universe. Thus, even if all matter in the universe were at disposal, it would not be 


possible to write down the decimal representation of a googolplex [I]. 


Properties 


1. A googol is approximately the [factorial] of 70. The only in a googol and 
a googolplex are 2 and 5 [I]. 


2. Using |Stirling’s formula) we can approximate the factorial of a googol to obtain 
(101)! P 19 (9:95 10%) 


History and etymology 


The googol was created by the American mathematician Edward Kasner (1878-1955) [4] in 
[2] to illustrate the difference] between an unimaginably large number and jinfinity| The mame 
’googol’ was coined by Kasner’s nine-year-old nephew Milton Sirotta in 1938 when asked to 
give a name for a huge number. The name googol was perhaps influenced by the comic strip 


character) Barney Google [I] B]. 


REFERENCES 
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NY, U SA: Simon and Schuster, 1967; Dover Pubns, April 2001; London: Penguin, 1940, ISBN 
0486417034). 

3. Douglas Harper’s [Etymology online dictionary, Googol] 12/2003. 
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81.19 googolplex 


A googolplex is 101%; that is, 10 raised to the googol-th power}, This can also be viewed as 
a one followed by a |googol] zeros at its right. 
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81.20 greatest common divisor 


Let a and b be given with at least one of them different from zero. The greatest 
common divisor of a and b, denoted by gcd(a, b), is the positive integer d satisfying 
1. d|a and d|b, 


2. if cla and c|b, then c <S d. 


More intuitively, the greatest common divisor is the largest integer dividing both a and b. 
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81.21 group theoretic proof of Wilson’s theorem 


Here we present a {group} theoretic proof of it. clearly 


„it is enough to show that (p—2)! & 1 (mod p) since p—1 + —1 (mod p). By|Sylow’s theorems|we 
have that p-Sylow’s of S,,the/symmetric group] on p elements,have lorder p,and the 


number n, of is congruent] to 1 modulo p.Let P be a Sylow’s subgroup of 
S,.Note that P is generated by a p-cycle.There are (p — 1)! [cycles] of lenght p in S,.Each 


p-Sylow subgroup p — 1 cycles of lenght p,hence there are oe = (p — 2)! different 
p-Sylow subgrups in S,,i.c. np = (p — 2)! .From Sylow’s Second Theorem, it follows that 
(p — 2)!< 1 (mod p),so (p — 1)! = —1 (mod p). 
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81.22 harmonic mean 


If ay, @2,...,@,, are positive numbers, we define their harmonic mean as: 
n 
H.M. = ae eee eee oe 
al a2 an 
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If you travel from city A to city B at x miles per hour, and then you travel back at y miles 
per hour. What was the {average velocity for the whole trip? 
The harmomic mean of x and y!. That is, the average velocity is 
2 22y 
x+y 
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81.23 mean 


A mean is a homogeneous function| f whose domain is the collection of all finite multisets| of 
R and whose \codomain is R, such that for any set S = {£1, £2, £3, . . . , £4} of real numbers 


min{ z1, z2, £3, doe £4} S FG) < max{z1, Wow eg Z4}. 


Pythagoras identified three of means: the arithmetic mean, the 
and the Other well-known means include: the the mode, the 


arithmetic-geometric mean, the arithmetic-harmonic mean, the harmonic-geometric mean, 


the root-mean-square] (sometimes called the quadratic mean), the identric mean, the Hero- 
nian mean, and the Cesro mean. Even the minimum and maximum functions are means 
(though vacuously so). 


It should be noted that the arithmetic mean is sometimes simply referred to as the ” mean.” 
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81.24 number field 


A [field] which is a finite extension] of Q, the rational numbers, is called a number field. 
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81.25 pi 


The number 7 is the ratio between the and the on any given [circle 
That is, in any circle, dividing the perimeter by the diameter always gives the same answer: 
3.14159265358... 
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Over human history there were many attempts to/calculate| this number precisely. One of the 
oldest approximations appears in the Rhind Papyrus (circa 1650 B.C.) where a geometrical 
construction is given where (16/9)? = 3.1604... is used as an approximation to 7 although 
this was not explicitly mentioned. 


It wasn’t until the Greeks that there were systematical attemps to calculate m. Archimedes 
[I], in the third century B.C. used [regular polygons inscribed and circumscribed to a circle 
to approximate 7: the more sides a polygon) has, the closer to the circle it becomes and 
therefore the ratio between the polygon’s area between the {square of the yields ap- 
proximations to 7. Using this method he showed that 223/71 < m < 22/7 (3.140845... < 
T < 3.142857...). 


Aroud the world there were also attemps to calculate 7. Brahmagupta [I] gave the value 
of v10 = 3.16227... using a method similar to Archimedes’. Chinese mathematician Tsu 
Chung-Chih (CA) 500 A.D.) gave the approximation 355/113 = 3.141592920.... 


Later, during the renaissance, Leonardo de Pisa (Fibonacci) [I] used 96-sideed regular poly- 
gons to find the approximation 864/275 = 3.141818... 


For centuries, variations on Archimedes’ method were the only tool known, but Viète [I 


gave in 1593 the [formula] 


which was the first analytical expression for 7 involving [infinite] summations or products. 
Later with the advent of calculus many of these formulas were discovered. Some examples 
are Wallis’ [I] formula: 


T 22 4 4 6 
and Leibniz’s formula, 
on oe re 
4 3 6 7 9 11 Í 


obtained by developing arctan(7/4) using and with some more advanced tech- 
niques, 

6¢ (2), 
found by determining the value of the Riemann Zeta function at b = 2. 


The Leibniz expression provides an alternate way to define m (namely 4 times the limit! of 
the and it is one of the formal ways to define m when studying analysis in [order] to 
avoid the geometrical definition. 


It is known that 7 is not a [rational number! (quotient of two integers). Moreover, 7 is not 
algebraic] over the (that is, it is a [transcendental number). This means that no 
polynomial) with rational coefficients can have 7 as a /root| Its irrationality implies that its 
decimal expansion (or any integer |base| for that matter) is not [finite] nor periodic. 
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81.26 proof of Wilson’s theorem 


We begin by noting that 


(p — 1)!= @—1)@—2)---(2)0) 
Since we are working (mod#1), all the numbers 2,...,p — 2 have (mod#1), and 


each can be paired with its inverse within 2,...,p—2. This leaves 1, which is its own inverse, 
and p — 1, also its own inverse. Hence we can write 


(p— 1)! (p—1)(1)---(1)_ (mod p) 


but then 
(p—1)!< (p—1)<—-1 (mod p) 
Hence 
(p—1)!<-1 (mod p) 
o 
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81.27 proof of fundamental theorem of arithmetic 


If n is prime, there is nothing to prove, so assume n is composite, Then there exists an 


integer|d such that 1 < d < n and d|n. By the|well-ordering principle, we can pick the small- 
est such integer; call it pı. If pı were composite, then it would have a|divisor!1 < q < pı, but 


q|pı = qld, contradicting minimality of pı. Thus pı is prime. Write n = pını. If nı is prime, 
we have the desired Otherwise, the same argument yields a new prime, po, 
such that nı = ponz => n = pipeny. The decreasing sequencen > nı > nz >- > 1 cannot 
continue indefinitely, so at some point nz_1 is a prime; call it p. This leads to the prime 
factorization n = p,p2--- Pk. 


To show uniqueness, assume we have two prime factorizations 


n = Pip? Pr = 192°°° Ws- 


Assume without loss of generality that r < s, and that our primes are written in increasing 
magnitude, so pı S ++- S pr, qı S -++ S qs. Since pi|qige-+-qs and the q; are prime, we must 
have pı = q for some k, but then pı > qı. The same reasoning gives qı > pı, SO pı = qı- 
Cancel this common factor to get 


P2P3 ` ` * Pr = 9203 ` °$ ds- 
Continuing, we can divide by all p; and get 
1 = Gr414r42°** qs- 


But the q; were assumed to be > 1, so they must be equal to 1. Then r = s and p; = qi for 
all 7, making the two factorizations identical. 
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81.28 root of unity 


The assures us that the [polynomial] z” — 1 = 0 has n [roots] 
on C. That is, there exist n complex numbers) z such that z” = 1. These numbers are called 


roots of unity. 


If ¢ = e?/" = cos(2r/n) + isin(27/n), then the n-th roots of the unity are: ¢* = e27*/" — 
cos(27k/n) + isin(2rk/n) for k = 1,2,...,n. 


If drawed on the |complex plane} the n-th roots of unity are the [vertices] of a regular] n-gon. 
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Chapter 82 


11-01 — Instructional exposition 
(textbooks, tutorial papers, etc.) 


82.1 base 


Most È| written number systems are built upon the concept of base for their functioning and 
conveying of quantitative meaning. In these systems, meaning is derived from two things: 


symbols and places. The of a value then follows the schema: 


...895159.S5_15_95_3... 


Where each s; is some symbol that has a quantitative value. Places to the left of the point 
(.) are worth whole and places to the right are worth fractional units. It is the base 
that tells us how much of a|fraction| or how many whole units. Once a base b is chosen, the 
value of a number s95,59.S_1S_2S_3 would be calculated like: 


$95159.S_1S5_95_3 = S2 ` b? + s1: b! + So° b? + S1: b7! + S2: b7? + S_3° b7’ 


In our now-standard, Arabic-derived decimal system, the base b is equal to 10. Other very 
common and useful systems are [binary] hexadecimal, and octal, having b = 2, b = 16, and 
b = 8 respectively 


Each s; is a member of an alphabet’ of symbols which must have b members. Intuitively this 


1 but not all- see [Roman numerals] for an example of a baseless number system. 

* These are [generic] systems which are capable of representing any number. By contrast, our system of 
written time is a curious hybrid of bases (60, 60, and then 10 from there on) and has a/fixed|number of whole 
places and a different number of symbols (24) in the highest place, making it capable only of representing 
the same discrete, |finite! set of values over and over again. 
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makes sense: when we try to|represent|the number which follows “9” in the decimal system, 
we know it must be “10”, since there is no symbol after “9.” Hence, place as well as symbol 
conveys the meaning, and base tells us how much a unit in each place is worth. 


Curiously, though one would think that the choice of base leads to merely a different way 
of rendering the same there are instances where things are variously provable 
or proven in some bases, but not others. For instance, there exists a non-recursive 
for the nth binary digit of 7, but not for decimal- one still must ‘calculate all of the n — 1 


preceding decimal digits of 7 to get the nth (see this paper). 
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Chapter 83 


11-XX — Number theory 


83.1 Lehmer’s Conjecture 


If m is a positive quantity, to find a polynomial of the form 
f(a) =X" +a,X"-1+...+4, 
where the a’s are such that the [absolute value| of the product of those of f 


which lie outside the unit circle, lies between 1 and 1+ m 


” This problem, in interest in itself, is especially important for our purposes. Whether or not 
the problem has a solution for m į 0.176 we do not know.” — Derrick Henry Lehmer, 1933. 


We define Mahler’s|measureéjof a polynomial f to be the absolute value of the product of those 
roots of f which lie outside the unit circle, multiplied by the absolute value of the coefficient 
of the leading term! of f. We shall denote it M(f). 


Lehmer’s conjecture [states that there exists alconstant! C j 1 such that every polynomial f 
with integer coefficients and M(f) ¿ 1 has M(f) C. 
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83.2 Sierpinski conjecture 


In 1960 Waclaw Sierpinski (1882-1969) proved the following interesting result: 


Theorem: There exist infinitely many [odd integers) k such that k2” + 1 is [composite] for 


every n > 1. 
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A multiplier k with this is called a The Sierpinski problem 


consists in determining the smallest Sierpinski number. In 1962, John Selfridge discovered 
the Sierpinski number k = 78557, which is now believed to be in fact the smallest such 
number. 


Conjecture: The k = 78557 is the smallest Sierpinski number. To prove the 
conjecture, it would be sufficient to exhibit a{prime|k2” + 1 for each k < 78557. 
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83.3 prime triples conjecture 


There exists infinitely many triples p, q,r of such that P = pq+pr+qr isa 


prime 
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Chapter 84 


11A05 — Multiplicative structure; 
Euclidean algorithm; greatest 
common divisors 


84.1 Bezout’s lemma (number theory) 


Let a,b be [integers] not both zero. Then there exist two integers x,y such that: 
ax + by = gcd(a, b). 


This does not only work on Z but on every where an Euclidean valuation 
has been defined. 
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84.2 Euclid’s algorithm 


Euclid’s algorithm describes a procedure for finding the of two 


Suppose a, b € Z, and without loss of generality, b > 0, because if b < 0, then gcd(a, b) = 
gcd(a, |b|), and if b = 0, then gcd(a, b) = |a|. Put d := gcd(a, b). 


By the division algorithm for integers, we may find integers q and r such that a = qob + ro 


where 0 < ro < b. 


Notice that gcd(a, b) = gcd(b, ro), because d | a and d | b, so d | ro = a — qob and if b and ro 
had a common [divisor] d’ larger than d, then d' would also be a common divisor of a and b, 
contradicting d’s maximality. Thus, d = gcd(b, ro). 
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So we may repeat the division, this time with b and rp. Proceeding recursively, we obtain 


= qob+ro with O < ro < b 
b = qro+rı with 0 


<S rı < To 
gri + r2 with O S r2 < rı 
< 


To 


rı = q3r2 +r3 with 0 < r3 < rə 


Thus we obtain a decreasing Sequence] of nonnegative integers b > ro > rı > T2 > ..., 
which must eventually reach zero, that is to say, r, = 0 for some n, and the algorithm 
terminates. We may easily generalize the previous argument to show that d = ged(rk-1, rk) = 
gcd(rk, 7x41) for k = 0,1,2,..., where r_; = b. Therefore, 


d = gcd (fyi, rn) = gcd (rn-1, 0) = Tn-1- 
More colloquially, the greatest common divisor is the last nonzero remainder in the algorithm. 


The algorithm provides a bit more than this. It also yields a way to express the d as a 


linear combinationlof a and b, a fact obscurely known as|Bezout’s lemma) For we have that 


a— qob = ro 
b=- qro = 71 
ro= gri = 12 
Ti — @rq = T3 


Tn—-3 — In-1T n-2 Tn—1 


Tn=2 = nTn-1 


so substituting each remainder rẹ into the next equation we obtain 


b — qı (a = qob) = kia + Lb = rı 
(a — qob) — q2(kia + lıb) = kathb = r 


(kya + lb) = q3(koa + lab) = kza + lb = T3 


(Kn—3@ + ln-3b) — qn(kn-2a + ln-2b) = kna +ln-1b = Thi 


Sometimes, especially for manual computations, it is preferrable to write all the algorithm 
in a tabular format. As an example, let us apply the algorithm to a = 756 and b = 595. 
The following table details the procedure. The variables at the top of each [column] (without 
subscripts) have the same meaning as above. That is to say, r is used for the sequence of 
remainders and q for the corresponding sequence of quotients. The entries in the k and / 


ATT 


1 
3 
1 
49 |2| 4 | —5 
3 
2 


columns are obtained by multiplying the current values for k and l by the q in this row, and 
subtracting the results from the k and / in the previous row. 


Thus, gcd(756, 595) = 7 and 37 - 756 — 47 - 595 = 7. 


Euclid’s algorithm was first described in his classic work Elements, which also contained 
procedures for geometrical constructions. These are the first known formally described algo- 
rithms. Prior to this, informally defined algorithms were in common use to perform various 
computations, but Elements contained the first attempt to rigorously describe a procedure 
and explain why its results are admissable. Euclid’s algorithm for greatest common divisor 
is still commonly used today; since Elements was published in the fourth century BC, this 
algorithm has been in use for nearly 2400 years! 
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84.3 Euclid’s lemma 


If albc, with gcd(a, b) = 1, then alc. 
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84.4 Euclid’s lemma proof 


We have albc, so be = na, with n an|integer| Dividing both sides by a, we have 


be 
— = 
a 


But gcd(a, b) = 1 implies b/a is only an integer if a = 1. So 


which means a must dividel c. 
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84.5 fundamental theorem of arithmetic 


Each {natural number|n > 1 can be decomposed uniquely, up to the [order] of the factors| as 
a product of This allows us to write n in the unique 


a a a a 
n= pı ‘9 2033 +++ Pp E 


for some nonnegative [integer] k with p; prime and p; 4 pj for i A j. For some results it is 
also useful to assume that p; < p; for i < j. 
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84.6 perfect number 


An n is called [perfect] if it is the sum of all of n less than n itself. It is 


not known if there are any odd perfect numbers, but all even perfect numbers have been 
classified as follows: 


If 2* — 1 is for some k > 1, then 2*~1(2* — 1) is perfect, and every even perfect 
number is of this form. 


Proof: (=) Let p = 2* — 1 be prime, let n = 2471p, and define o(a) as the sum of all 


positive divisors of the integer a. Since ø is multiplicative (meaning o(ab) = o(a)o(b) when 
gcd(a, b) = 1), we have: 


an) = o 


|l 
Q 


which shows n is perfect. 


(<=) Assume n is an even perfect number. Write n = 2*~'m for some odd m and k > 2. 
Then we have ged(2%71, m) = 1, so 


a(n) = o(2*-1m) = o(2*1)o(m) = (F — 1)0(m). 
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But if n is perfect, then by definition a(n) = 2n which in our case means o(n) = 2n = 2*m. 
Piecing together our two formulae for a(n), we get 


2*m = (2* — 1)a(m). 


So (2* — 1)|2*m, which [forces] (2* — 1)|m. Write m = (2* — 1)M. So from above we have: 


2*m = (2° — 1)o(m) 
2# (2 —1)M = = (2*—1)a(m) 
2ÆM =a(m) 


Since m|m by definition and M|m by assumption, we have 
2*M = o(m) > m+ M =2*M, 


which forces o(m) = m + M. Thus m has only two divisors, m and M. Hence m must be 
prime, M = 1, and m = (2* —1)M = 2* — 1, from which the result follows. 
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84.7 smooth number 


n is a k-smooth number if all of n are less than k. 
Version: 1 Owner: bbukh Author(s): bbukh 


480 


Chapter 85 


11A07 — Congruences; primitive 
roots; residue systems 


85.1 Anton’s congruence 

For every n € N (nl), stands for the product of numbers between 1 and n which are not 
divisible by a given p. And we set (0!), = 1. 

The corollary below generalizes a result first found by Anton, Stickelberger, and Hensel: 
Let No be the least non-negative of n (mod pê) where p is a prime number and 


n € N. Then 


(n!), © (- 1) - (No!), (mod p°). 


Proof: We write each r in the product below as ip? + j to get 


(nl), = [I r 
1<r<n.p h 


= Il ip’ + j lI ip? +j 
0<i<|[n/p*|—-1L1<j<p*,p'|p i=|n/p* |,159<No.p*4 
[n/ps|-1 No 
| II I] 3 IT 3 
i=0 1<j<p*,p°|g j=1,p°\6 
pen (pein) : (No!), (mod p°). 


From Wilson’s theorem for prime powers it follows that 


(No!) for p=2,s>3 
1 P 
(n), © ( (<1). (Nol), otherwise l 
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85.2 Fermat’s Little Theorem proof (Inductive) 


We must show 
a? 4&1 (mod p) 


with p prime, and p{ a. 


When a = 1, we have 
17! 1 (mod p) 


Now assume the theorem holds for any a. We have as a direct consequence that 


a? <a (mod p) 


Let’s examine a+ 1. By the binomial theorem| we have 


(a+1? © (jet ACERTE jars 


<> atpa*+p(p—1)a?-? +---+ p(p—1)---2a+1 
S (a+1)+[pa?*+p(p—1)aP? +--+ p(p—1)---2a] 


However, note that the entire) bracketed [term] is divisible by p, since each element of it is 
divisble by p. Hence 


(a+ 1)? (a+1) (mod p) 
Since p is prime, we can cancel an (a + 1) from both sides, giving 
(a+1 4&1 (mod p) 


Then by iinduction| Fermat’s little theorem) holds in general. 
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85.3 Jacobi symbol 


Jacobi Symbol. 


Let n be an odd positive with [prime] factorization p,°'---p,°*. Let a > 0 be an 
integer. The Jacobi symbol (2) is defined to be 


where (+) is the of a and p; 
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85.4 Shanks-Tonelli algorithm 


The Shanks-Tonelli algorithm is a procedure for solving a congruence] of the form ron 
(mod p), where p is an odd [prime] and n is a [quadratic residue of p. In other words, it can 
be used to compute modular [square roots) 


First find positive [integers] Q and S such that p — 1 = 2°Q, where Q is odd. Then find a 
quadratic nonresidue W of p and compute V @ W® (mod p). Then find an integer n’ that 


is the multiplicative) inverse] of n (mod p) (i.e., nn’ & 1 (mod p)). 


Compute 
+1 


Ren? (mod p) 


and find the [smallest integer] i > 0 that [satisfies] 
(R?n')? <1 (mod p) 
If i = 0, then x = R, and the algorithm stops. Otherwise, compute 
R's RV? 


(S-i-1) 


(mod p) 
and repeat the procedure for R = R’. 
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85.5 Wieferich prime 


By |Fermat’s little theorem) the relationship p | 2? — 1 holds for any odd [prime] p. An odd 
prime p such that p? { 2? — 1 is called a Wieferich prime. It is currently unknown whether or 
not there are infinitely many Wieferich primes, or whether or not there are infinitely many 


primes that are not Wieferich, though the implies the former. 
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85.6 Wilson’s theorem for prime powers 


For every n, let (nl), denote the product of numbers 1 < m < n with 
gcd(m, p) = 1. 


For [prime] p and s € N 


1 forp=2,s>3 


—1 otherwise (mod p°). 


od, ( 


Proof: We pair up all factors] of the product (p°!) into those numbers m where m 4 m7! 
(mod p°) and those where this is not the case. So (p*!), is (modulo p*) to the 
product of those numbers m where m = m—1 (mod p°) => m? & 1 (mod pê). 


Let p be an odd prime and s € N. Since 2 Ap’, p*|(m? — 1) implies p*|(m + 1) either or 
p*|(m — 1). This leads to 
(p°!), —1 (mod p*) 


for odd prime p and any s € N. 


Now let p = 2 and s > 2. Then 


(1+4.2°")° 1 (mod 2°), t =+ 1. 
Since 
(21 41) (251-1) @-1 (mod 2°), 


we have 
(p°!),, = (-1).(-1) =1 (mod p°) 
For p = 2,s > 3, but —1 for s = 1,2. 
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85.7 factorial module prime powers 


For n € N and p, (nl), is the product of numbers 1 < m < nip jm. 


For n,s € N and prime number p, we have the {congruence} 


| d 
ee les] f mo S 
pola] II (EDLE (N),) (mod p°), 


where N; is the least non-negative residue of Ea (mod pê). d+ 1 denotes the number of 
digits in the p-adic of n. More preciesly, +1 is —1 unless p = 2, s > 3. 


Proof: Let i > 0. Then the set of numbers between 1 and Ea is 


p* 
n 
{mee nes [Ff 


This is true for every integer’ i with p’t! < n. So we have 


[piel = (zl Ie (85.7.1) 


Multiplying all Iterms| with 0 < į < d, where d is the largest power) of p not greater than n, 
the statement follows from the generalization of 
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85.8 proof of Euler-Fermat theorem 


Let a1, 42,...,4¢(n) be all positive less than n which are to n. Since 
gced(a,n) = 1, then the set aa, aav, ...,aag(n) are each [congruent] to one of the integers 
a1, A2,- - - , Agia) IN some lorder| Taking the product of these congruences, we get 


(aa) (aaz): (aagin)) € aiaz: agn) (mod n) 


hence 
a?) (ajan OA Ag(n)) > ajag::: Ag(n) (mod n). 


Since gcd(aja2---A¢(n),n) = 1, we can both sides by ajag-+-dgn), and the desired 
result follows. 
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85.9 proof of Lucas’s theorem 


Let n > m EN. Let ao, bo be the least non-negative [residues] of n, m (mod p), respectively. 
(Additionally, we set r = n — m, and ro is the least non-negative residue of r modulo p.) 
Then the statement follows from 


AE) oem 


We define the ’carry indicators’ c; for all i > 0 as 


ci = | 0 otherwise 


and additionally c_; = 0. 


The special case s = 1 of[Anton’s congruence is: 
(n!), © (-1)L# - ao! (mod p), (85.9.1) 
where dg as defined above, and (n!) p is the product of numbers < n not divisible by p. So 


we have 
n! 


2 | inl? | 
When dividing by the left-hand terms of the congruences] for m and r, we see that the power} 


e :l-lel-El- 


= (nt), © (—1)L>] «ag! (mod p) 


So we get the congruence 


or equivalently 


| ) (mod p). (85.9.2) 


Now we consider cy = 1. Since 
ao = bo + ro — pco = p, 
a 


ao 


bo + ro < bo +> co = 1 e bo — (p — ro) = ao < bo > (7°) = 0. So both congruences—-the one 
in the statement and (91.2-1)— produce the same results. 
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Chapter 86 


11A15 — Power residues, reciprocity 


86.1 Euler’s criterion 


Let p be an odd and n an such that (n,p) = 1 (that is, n and p are 
relatively prime). 


Then (n|p) ¢ n®-/2 (mod p) where (nlp) is the 
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86.2 Gauss’ lemma 


Gauss’s lemma on quadratic residues, (GL) is: 


Proposition 1: Let p be an odd [prime] and let n be an [integer] which is not a multiple of p. 
Let u be the number of elements of the set 


-1 
{231.2057 n) 
2 


whose least positive residues, modulo p, are greater than p/2. Then 


where (2) is the Legendre symbol 


That is, n is a quadratic residue modulo p when u is even and it is a quadratic nonresidue 
when u is odd. 
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GL is the special case 


_ p-l 
i) 


of the slightly more general statement below. Write F, for the [field] of p elements, and 
identify Fp with the set {0,1,...,p — 1}, with its addition and multiplication mod p. 


Proposition 2: Let S be a subset] of F% such that x € S or —x € S, but not both, for any 
x € FY. For n € FF let u(n) be the number of elements k € S such that kn ¢ S. Then 


Proof: If a and 6 are distinct elements of S, we cannot have an = +bn, in view of the 
hypothesis on S. Therefore 
I] an = (-1)X™ I] a. 


acs aces 
On the left we have 
n-d] [a= (=) IC 
acs p acs 


by [Euler’s criterion! So 


Qpe 


aces aes 


The product is nonzero, hence can be cancelled, yielding the proposition. 


Remarks: Using GL, it is straightforward to prove that for any odd prime p: 
(=) o Jl ifr <1 (mod 4) 
p) \-1 ifr&-—1 (mod 4) 
(=) o Jl ifr +1 (mod 8) 
p —1 ifx&+3 (mod 8) 
The condition on S can also be stated like this: for any [Square] x? € F}, there is a unique 
y € S such that x? = y?. Apart from the usual choice 


S = {1,2,..., (p — 1)/2}, 


the set 
{2,4,...,p—1} 


has also been used, notably by Eisenstein. I think it was also Eisenstein who gave us this 
trigonometric identity, which is closely related to GL: 


()-1ee 
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It is possible to prove GL or Proposition 2 “from scratch”, without leaning on Euler’s cri- 
terion, the existence of a primitive root, or the fact that a polynomial] over F, has no more 


zeros than its degree 
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86.3 Zolotarev’s lemma 


We will identify the [ring] Z, of integers] modulo n, with the set {0,1,...n— 1}. 


1: (Zolotarev) For any {prime number) p and any m € ZF, the Legendre symbol (2) 
is equal to the [signature] of the Tm : t+ MT of Z>. 


Proof: Let us write e(o) for the signature of any permutation ø. If o is a circular permu- 
tation on a set of k elements, then e(o) = (—1)*"1. 


Let i be the order| of m in Z;,. Then the permutation Tm consists of (p — 1) /ilorbits, each of 
size 1, whence 
(Tm) = (1) eee 


If i is even, then 


And if i is odd, then 2i divides) p — 1, so 


me-D/2 — mT —1 = E(Tm). 


In both cases, the lemma follows from Euler’s criterion 


Lemma 1 extends easily from the Legendre symbol to the |Jacobi Symbol (=) for odd n. 


The following is Zolotarev’s penetrating proof of the quadratic reciprocity law, using Lemma 
1. 


Lemma 2: Let A be the permutation of the set 
Amn = {0,1,..., m — 1} x {0,122.57 — 1} 
which [maps] the kth element of the Sequence} 
(0,0)(0,1)...(0,n —1)(1,0)...(1,n — 1)(2,0)...(m — 1,n — 1), 
to the kth element of the sequence 
(0,0)(1,0)...(m — 1,0)(0,1)...(m — 1,1)(0,2)... (m — 1,n — 1), 
for every k from 1 to mn. Then 


e(à) = (=~ DmDm 
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and if m and n are both odd, 
e(A) = epee ls 


Proof: We will use the fact that the signature of a permutation of a set 
is determined by the number offinversions| of that permutation. The sequence (0,0), (0,1)... 


defines on Amn a{total order! < in which the [relation] (i, j) < (i’, j’) means 
<7 or (i=ť and j < j’). 

But A(t, 7’) < A(z, j) means 
j <jor(j' => andi <i). 


The only pairs ((i, j), (t, 7’)) that get inverted are, therefore, the ones with i < 7’ and j > 7’. 
There are indeed (5) (5) such pairs, proving the first and the second follows easily. 


2 


Now let p and q be distinct odd primes. Denote by m the canonical ring 
Zing — Zp X Zq. Define two permutations a and p of Zp x Z4 by 


a(z,y) = (qrt+y,y) 
(x,y) = (z,x+ py). 


Last, define a map A: Zpg —> Zpq by 
Alx +qy)=pr+y 
for x € {0,1,...¢g—1} and y € {0,1,...p—1}. Evidently À is a permutation. 


We have 
nm(qx+y)=(qrt+y,y) 
n(x + py) = (x, x + py) 


and therefore 
moron toa=Q. 


Let us compare the signatures of the two sides. The permutation m +> qx + y is the 
composition of m œ> qx and m — m + y. The latter has signature 1, whence by Lemma 1, 


and similarly 


By Lemma 2, 
elr o don?) = (-1)P— VDA, 
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Thus 


carmen (8) = (8) 


which is the quadratic reciprocity law. 
Reference 


G. Zolotarev, Nouvelle démonstration de la loi de réciprocité de Legendre, Nouv. Ann. Math 
(2), 11 (1872), 354-362 
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86.4 cubic reciprocity law 


In afring|Z/nZ, a cubic residue is just a value of the [function] x3 for some invertible element 
x of the ring. Cubic residues display a reciprocity phenomenon similar to that seen with 
quadratic residues! But we need some preparation in order to the cubic reciprocity 
law. 


w will denote See which is one of the of 1. K will denote the ring 


K = Zw]. The elements of K are the complex numbers) a + bw where a and b are [integers] 
We define the norm) N : K — Z by 


N(at+bw) =a? +ab+b? 
or equivalently 


N(z2)= 2z. 


Whereas Z has only twolunits] (meaning invertible elements), namely +1, K has six, namely 
all the sixth roots) of 1: 
+1 +w +w? 


and we know w? = —1 — w. Two nonzero elements a and 8 of K are said to be [associates 
if œ = ĝu for some unit u. This is an equivalence relation, and any nonzero element has six 


associates. 


K isa ring, hence has unique factorization. Let us call p € K “irreducible” if 
the condition p = af implies that a or 8, but not both, is a unit. It turns out that the 
lirreducible! elements of K are (up to multiplication by units): 


— the number 1 — w, which has norm 3. We will denote it by z. 


~ positive [reall integers — q 2 (mod 3) which are ~|prime| in Z. Such integers are called 


rational primes in K. 


— complex numbers q = a + bw where N(q) is a prime in Z and N(q) & 1 (mod 3). 
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For example, 3+ 2w is a prime in K because its norm, 19, is prime in Z and is 1 mod 3; but 
19 is not a prime in K. 


Now we need some convention whereby at most one of any six associates is called a prime. 
By convention, the following numbers are nominated: 


— the number 7. 
— rational primes (rather than their negative or complex associates). 
— complex numbers q = a + bw where N(q) = 1 (mod 3) is prime in Z and 


a < 2 (mod 3) 
b <= 0 (mod 3). 


One can verify that this selection exists and is unambigous. 


Next, we seek a three-valued function analogous to the two-valued quadratic residue character! 


LS (2). Let p bea prime in K, with p # r. If a is any element of K such that p f a, then 
aN®)-l 41 (mod p). 
Since N(p) — 1 is a multiple of 3, we can define a function 
Xp: K > {1,w,w?} 
by 


x(a) S aN (e)-1)/3 if pta 
x(a) = Oifpla. 


Xp is a character, called the cubic residue character mod p. We have x(a) = 1 if and only 
if a is a nonzero cube mod p. (Compare Euler’s criterion.) 


At last we can state this famous result of Eisenstein and Jacobi: 


Theorem (Cubic Reciprocity Law): If p and ø are any two distinct primes in K, neither 
of them 7, then 


X(T) = Xo(p) Í 


The quadratic reciprocity law has two “supplements” which describe (=) and (2). Like- 


wise the cubic law has this supplement, due to Eisenstein: 


Theorem: For any prime p in K, other than 7, 


Xp(7) =” 
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where 
m = (p+1)/3 if p is a rational prime 


m = (a4+1)/3 if p = a + bw is a complex prime. 


Remarks:Some writers refer to our “irreducible” elements as “primes” in K; what we have 
’ 
called primes, they call “primary primes”. 


The quadratic reciprocity law would take a simpler form if we were to make a different 
convention on what is a prime in Z, a convention similar to the one in K: a prime in Z is 
either 2 or an irreducible element x of Z such that z = 1 (mod 4). The primes would then 
be 2, -3, 5, -7, -11, 13, ...and the QRL would say simply 


OOR 


for any two distinct odd primes p and q. 
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86.5 proof of Euler’s criterion 


(All congruences] are modulo p for the proof; omitted for clarity.) 
Let 


£ = a®-D/2 


Then z? © 1 by Fermat’s little theorem) Thus: 


ztl 


Now consider the two possibilities: 


e If a is a quadratic residue then by definition, a = b? for some b. Hence: 


rea? opPlel 


e It remains to show that a'?~)/? < —1 if a is a quadratic non-residue. We can proceed 
in two ways: 


Proof (a) the set { 1,...,p—1 } into pairs { c,d } such thatcd = a. Then 
c and d must always be distinct since a is a non-residue. Hence, the product of 


the union] of the partitions is: 
(p—1)!e a®@-YP o -1 


and the result follows by Wilson’s theorem| 
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Proof (b) The equation: 
0-1/2 ey 


has at most (p — 1)/2 But we already know of (p — 1)/2 distinct roots of 
the above equation, these being the quadratic residues modulo p. So a can’t be a 
root, yet a®-)/? 4 +1. Thus we must have: 


a®-D/2 es 4 


[QED] 
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86.6 proof of quadratic reciprocity rule 


The [quadratic reciprocity] law is: 
Theorem: (Gauss) Let p and q be distinct odd |primesļ and write p = 2a + 1 and q = 2b+ 1. 
Then (2) (2) =(=1)". 
a) \P 
((4) is the Legendre symbol}) 
Proof: Let R be the|subset! [—a, a] x [—b, b] of Z x Z. Let S be the [interval] 


[-(pq — 1)/2, (pq — 1)/2] 


of Z. By the [Chinese remainder theorem) there exists a unique f: S — R such 
that, for any s € S, if we write f(s) = (x,y), then x © s (mod p) and y © s (mod q). 
Let P be the subset of R consisting of the values of f on [1, (pq — 1)/2]. P [contains] say, u 
elements of the form (zx, 0) such that x < 0, and v elements of the form (0, y) with y < 0. 
Intending to apply [Gauss lemma, we seek some kind of comparison between u and v. 


We define three subsets of P by 


Rọ = {(x,y) € P|z >0,y>0} 
Ry {(x,y) € Plz < 0, y = 0} 
Ry = {(2,y) € Pix => 0,y <0} 


and we let N; be the|cardinall of R; for each i. 


P has ab + b elements in the {region y > 0, namely f(m) for all m of the form k + lq with 
1<k<band0<l<a. Thus 


No+ Ni =ab+b-—(b-—v)+u 
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i.e. 

No+ Ni = ab+u+v. (86.6.1) 
Swapping p and q, we have likewise 

NotNo = ab+u+v. (86.6.2) 
Furthermore, for any s € S, if f(s) = (x,y) then f(—s) = (—z, —y). It follows that for any 
(x,y) E€ R other than (0,0), either (x,y) or (—x, —y) is in P, but not both. Therefore 

Ni +N, = ab+u+v. (86.6.3) 
Adding (1), (2), and (3) gives us 

0&ab+u+v (mod 2) 


(-1)” = (-1)*(-1)" 


which, in view of Gauss’s lemma, is the desired conclusion. 


For a bibliography of the more than 200 known proofs of the QRL, see [Lemmermeyer) . 
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86.7 quadratic character of 2 


For any odd p, quickly yields 


lifp& +1 (mod 8) (86.7.1) 


ZN 

BIN 

— 7 
I 


—l ifp +3 (mod 8) (86.7.2) 


ATTN 
S lw 
Ny 
|l 


But there is another way, which goes back to Euler, and is worth seeing, inasmuch as it is 
the prototype of certain more general arguments about [character] sums. 


Let o be a primitive eighth in an algebraic closure] of Z/pZ, and write T = 


ao +a7!. We have gt = —1, whence o? + o~? = 0, whence 
=o, 


By the binomial formula, we have 


If p & +1 (mod 8), this implies 7” = 7. If p & +3 (mod 8), we get instead T? = 0° +a = 
—o7! — g =-— tT. In both cases, we get T?-1 = (2). proving (1) and (2). 
A variation of the argument, closer to Euler’s, goes as follows. Write 


o = exp(277/8) 


T=0 +0! 


Both are Arguing much as above, we end up with 
zi 2 
T & |- (mod p) 
p 


which is enough. 
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86.8 quadratic reciprocity for polynomials 


Let F be a [finite field] of [characteristic] p, and let f and g be distinct 


(non-constant) polynomials in the polynomial ring F|X]. Define the Legendre symbol (4) 
by 

¢ jl if f is a square in the quotient ring F|X]/(g), 
© )—1 otherwise. 


The quadratic reciprocity theorem for polynomials over a finite field [states that 
(£) (5) = (—1) F degl) deglo), 
g) \f 
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86.9 quadratic reciprocity rule 


The quadratic reciprocity rule |states] that 


where (+) is the p and q are odd and [prime and at least one of p or q is 


positive. 
Note that the Legendre symbol may also appear as (p | q). 
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86.10 quadratic residue 


Let a,n be If there exists an integer x that 
xz’? a (mod n) 


then a is said to be a quadratic residue of n. Otherwise, a is called a quadratic non- 
residue of n. 
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Chapter 87 


11A25 — Arithmetic functions; related 
numbers; inversion formulas 


87.1 Dirichlet character 


A Dirichlet character mod m y is a [group homomorphism) from (4)° to C*. The function! 
y: Z — C given by 
x(n mod m) (ged(n,m) = 1) 

ws l 0 (gcd(n, m) # 1) 
is also referred to as a Dirichlet character. The Dirichlet characters mod m form a [group] if 
one defines yx/ to be the function which takes a € (5Y to xla)x/(a). It turns out that 
this resulting group is to (-4)". The trivial character is given by x(a) = 1 for 
alla € (4)*, and it acts as the identity element] for the group. A [character] is said to be 
primitive if it does not arise as the |composite| 


a a ee 
mZ mZ 
for any proper |divisor| m/|m, where the first map is the and the second map 


is a character mod m/. If x is non-primitive, the [gcd] of all such m’ is called the conductor 
of x. 
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87.2 Liouville function 


The Liouville function is defined by A(1) = 1 and A(n) = (—1)"i tht tr if the 
factorization of n > 1 is n = ph" ps? --- pk. This function] is and satisfies the 
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identity 
Oe 1 ifn= “j for some integer m 
an 0 otherwise. 
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87.3 Mangoldt function 


The Mangoldt function A is defined by 


A(n) i if n = p*, where p is a prime and k is a natural number > 1 
nI = 


0, otherwise 


The\Moebius inversionlformulalleads to thelidentity|A(n) = oy, w(n/d) nd = — Zan u(d) Ind. 
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87.4 Mertens’ first theorem 


For any real number x > 2 we have 


for all p. 
Moreover, the [term] O(1) arising in this [formula] lies in the [open interval] (—1 —1n4,1n4). 
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87.5 Moebius function 


For a positive jinteger|n, define u by 


if ifn=1 
u(n) = < 0, if p°|n for some prime p 
(—1)", ifn =p pe---p,, where the p; are distinct primes. 
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In other words, u(n) = 0 ifn is not a\square-free|integer, while u(n) = (—1)" if n is square-free 
with r The u is a multiplicative function, and obeys the 


L ifn=1 
Sound =s 
0 ifn>1 
d|n 
where d runs through the positive divisors of n. 
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87.6 Moebius in version 


The Mobius function and inversion formula 


For any integer/n > 1, let u(n) be 0 if n is divisible by the square of a [prime number] and if 
not, let u(n) = (—1)*™) where k(n) is the number of primes which [divide n. The resulting 


Function] u : N* > {—1,0, 1} is called the Möbius pu function, or just the Möbius function. 


Proposition 1: y is the unique [mapping] N* — Z such that 
pl), = 1 (87.6.1) 
bD uld) = Oforalln>1 (87.6.2) 


d|n 


Proof: By |induction| there can only be one function with these u clearly|satisfies| 
(1), so take some n > 1. Let p be some prime factor of n, and let m be the product of all 
the prime factors of n. 

X u(d) 


| 
M 
= 
S 


d\n d|m 
= Youd + oud) 
d|m d|m 
płd pld 
= X ad- X ula) 
d|m/p d|m/p 
= 0 


Proposition 2: Let f and g be two mappings of N* into some given ladditive| group) The 


conditions 


f(n) = X g(a) for all n € N* (87.6.3) 
d| 

gn) = X u(d)f (=) for all n € N* (87.6.4) 
d\n 
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are equivalent 


Proof: Fix some n € N*. Assuming (3), we have 


Yeas (F) = Lug Y se) 
djn djn 


eln/d 


= $9 adja (=) 


k|n dk 


= 2 (F) So ata) 


d|k 


= g(n) by Proposition 1 


Conversely, assuming (4), we get 


X g(a) 
d\n 


Yd mor (5) 


= Sr @)Xe(Z) 


=: fn) by Proposition 1 
as claimed. 


Definitions: In the notation of Proposition 1, f is called the Mobius transform of g, and 
(4) is called the Möbius [inversion] formula. 


Mobius-Rota inversion 


G.-C. Rota has described a generalization of the Mobius formalism. In it, the set N*, ordered 
by the |relation|2|y between elements x and y, is replaced by a more general ordered set, and 
u is replaced by a function of two variables. 


Let (S, <) be a locally finite ordered set, i.e. an ordered set such that {z € S|x < z < y} is 
alfinitel set for all x,y € S. Let A be the set of functions a: S x S — Z such that 


aļlz,z) = liorallge S (87.6.5) 
a(z,y) # Oimplies x < y (87.6.6) 


A becomes a{monoid] if we define the product of any two of its elements, say a and 3, by 
(aB)(x,y) =) a(z, t) A(t, y). 


tes 


The sum makes sense because a(x, t)G(t,y) is nonzero for only finitely many values of t. 


(Clearly this definition is akin to the definition of the product of two [square matrices} ) 
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Consider the element 1 of A defined simply by 


Cn 1 Tie oy 
Hey) = i 
í 0 otherwise. 


The function 1, regarded as a matrix over Z, has an inverse matrix, say v. That means 


Youle, tjut, y) = i so 


E; 0 otherwise. 
Thus for any f,g € A, the equations 


= g (87.6.7) 
g = vf (87.6.8) 


are equivalent. 


Now let’s sketch out how the traditional Möbius inversion is a special case of Rota’s notion. 
Let S be the set N*, ordered by the relation x|y between elements x and y. In this case, v 
is essentially a function of only one variable: 


Proposition 3: With the above notation, v(x, y) = u(y/x) for all x,y € N* such that zly. 


The proof is fairly straightforward, by induction on the number of elements of the 
{ze Slr <z<y}. 


Now let g be a function from N* to some additive group, and write G(x, y) = g(y/z) for all 


pairs (x,y) such that x|y. The equivalence] of (7) and (8), for g and its transform 9, is just 
Proposition 2. 


Example:Let E be a set, and let S be the set of all finite subsetslof Æ, ordered by linclusion| 
The ordered set S is left-finite, and for any x,y € S such that x C y, we have v(z,y) = 
(—1)!¥-*!, where |z| denotes the [cardinal] of the finite set z. 


A slightly more sophisticated example comes up in connection with the 
of a [graph] or 
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87.7 arithmetic function 


An arithmetic function is a f : Zt — C from the positive to the 


There are two noteworthy operations on the set of arithmetic functions: 
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If f and g are two arithmetic functions, the sum of f and g, denoted f + g, is given by 
(F +9)(n) = f(n) +9(n), 
and the Dirichlet convolution of f and g, denoted by f * g, is given by 


= rda (=) : 


d\n 


The set or arithmetic functions, equippied with these two binary operations, forms a 
The 0 of the [ring is the function f such that f(n) = 0 for any positive integer n. The 1 of 


the ring is the function f with f(1) =1 and f(n) = 0 for any n > 1. 
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87.8 multiplicative function 


In the number theory, a multiplicative function is an f(n) of the positive 
integer) n with the property| that f(1) = 1 and whenever a and b are |coprime, then: 


flab) = fla) f (b) . 


An arithmetic function f(n) is said to be completely multiplicative if f(1) = 1 and f(ab) = 
f(a) f(b) holds for all positive integers a and b, even when they are not coprime. In this case 


the function|is a homomorphism afhomomorphism|offmonoidsland, because of the fundamental 


is completely determined by its restriction to the Every completely multi- 
plicative function is multiplicative. 


Outside the number theory, the term) multiplicative is usually used for all functions with the 
property f(ab) = f(a) f(b) for all arguments a and b. This article discusses number theoretic 
multiplicative functions. 


Examples Examples of the multiplicative functions include many functions of an impor- 
tance in the number theory, such as: 


e (n): the Euler totient function, counting the positive integers coprime to n, 
e u(n): the[Moebius function] related to the number of prime factors|of/square-free numbers} 
e d(n): the number of positive divisors of n, 


: the sum of all the positive divisors of n, 
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e co(n): the sum of the k-th{powersjof all the positive divisors of n (where k may be any 
complex number), 


e Id(n): the {identity| function, defined by Id(n) = n, 
e Id*(n): the power functions, defined by Id*(n) = n* for any natural number] (or even 


complex number) k, 
e 1(n): the constant function, defined by 1(n) = 1, 


e <(n): the function defined by: 
0, ifn>1 
ct) = THO ite, 
where d runs through the positive divisors of n. 


An example of a non-multiplicative function is the arithmetic function r2(n) - the number 
of |representations| of n as a sum of {squares} of two integers, positive, negative, or zero, where 


in a counting the number of ways, reversal of order) is allowed. For example: 


1=1?+0? = (-17 +0? = 07+ 1? = 07 + (-1)? 


and therefore r2(1) = 4 # 1. This shows that the function is not multiplicative. However, 
r2(n)/4 is multiplicative. 


Properties A multiplicative function is completely determined by its values at the powers 
of prime numbers, a consequence of the fundamental theorem of arithmetic. Thus, if n is a 
product. of powers of distinct prime numbers, say n = p%q’---, then f(n) = f(p*) fle). 
This property of multiplicative functions significantly reduces the need for computation, as 
in the following examples for n = 144 = 24 - 3?: 


d(144) = (144) = o9(2*)o9(37) = (19 + 2° + 4° + 8° + 16°)(1° + 3° + 9°) = 5-3 = 15, 
a(144) = o(144) = o;(2*)o1(37) = (1' + 21 + 4’ + 8! + 16) (1' + 3 +9) = 31 - 13 = 408, 
o2(144) = o2(2*)o2(3") = (17+ 2? + 47 + 8? + 167)(1? + 3? + 9) = 341 - 91 = 31031, 
o3(144) = 03(2*)o3(37) = (18 + 23 + 4 + 8° + 16°) (13 + 3° + 9°) = 4681 - 757 = 3543517. 


Similarly, we have: 


(144) = 4(24) (3%) = 8-6 = 48 
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Convolution If f and g are two arithmetic functions, one defines a new arithmetic function 


f * g, the|convolution| of f and g, by: 
n 
(f * 9)(n) = dss (=), 


where the sum extends over all positive divisors d of n. Some general properties of this 
operation include (here the argument n is omitted in all functions): 


e If both f and g are multiplicative, then so is f * g, 


efxg=g*f, 
o (f*g)*h=fx(g*h), 
efxe=exfa=f. 


This shows that the multiplicative functions with the convolution form a 
with the identity element) e. among the multiplicative functions discussed above 


include: 


e j.* 1 =e (the Moebius inversion| formula), 
elxl=d, 
e Idx1l=a, 


e Id" x1 =o%, 


o*«1=I1d. 
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87.9 non-multiplicative function 


In the number a non-multiplicative function is an f(n) of the 
positive integer] n which is not multiplicative 
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Examples Some examples of a non-multiplicative functions are the arithmetic functions: 


e r2(n) - the number of unordered represetantions of n as a sum of\squares|of two integers, 
positive, negative or zero, 


e c4(n) - the number of ways that n can be expressed as the sum of four squares of 
nonnegative integers, where we distinguish between different orders) of the summands. 
For example: 


=P Per +0? = 0+1? +0 +0 +0 = 0+0 +1? +0 KH eae ST 
hence c4(1) = 4 #1. 


e The partition function P(n) - the number of ordered of n as a sum of 


positive integers. For instance: 
P(2- 5) = P(10) = 42 and 
P(2)P(5)=2-7=144 42. 
e The prime counting function|m(n). Here we first have (1) = 0 4 1 and then we have 


as yet for example: 
(2-5) =27(10) = 4 and 


n(2)n(5)=1-3=3#4. 
e The|Mangoldt function) A(n). A(1) = In 1 # 1 and for example: 
A(2- 5) = A(10) = 0 and 


A(2)A(5)=ln2-ln5 #0. 


We would think that for some n multiplicativity of A(n) would be true as in: 
A(2 - 6) = A(12) = 0 and 


A(2)A(6)=ln2-0=0, 
but we have to write: 


A(2?)A(3) =In2-In3 40. 
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87.10 totient 


A totient is a pequence] f : {1,2,3,...} — C such that 
gxf=h 


for some two completely multiplicative sequences g and h, where *» denotes the convolution 
product (or Dirichlet product; see multiplicative function). 


The term ‘totient’ was introduced by Sylvester in the 1880’s, but is seldom used nowadays 
except in two cases. The Euler totient ¢ [satisfies] 


lo * O = li 
where ¿g denotes the functionln = n* (which is completely multiplicative). The more general 
Jordan totient J; is defined by 


lo * Jk = Lk. 
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87.11 unit 


Let R be afring|with multiplicative identity|1. We say that u € R is an unit (or unital) if u 
[divides] 1 (denoted u | 1). That is, there exists an r € R such that 1 = ur = ur. 
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Chapter 88 


11A41 — Primes 


88.1 Chebyshev functions 


There are two different [functions] who are collectively known as the Chebyshev functions: 


d(x) = So logp 


psg 


where the notation used indicates the summation over all positive p less than or equal 
to x, and 


v(x) =} klogp, 


psz 


where the same summation notation is used and k denotes the unique such that 
pë < x but p**! > x. Heuristically, these first of these two functions measures the number 
of primes less than x and the second does the same, but weighting each prime in accordance 
with their logarithmic relationship to z. 


Many innocuous results in number theory owe their proof to a relatively simple analysis of 
the asymptotics of one or both of these functions. For example, the fact that for any n, we 


have 
I] p< 4” 


psn 


is [equivalent|to the statement that V(x) < xlog 4. 
A somewhat less innocuous result is that the prime number theorem] (i.e. that (£) ~ ie =) 


is equivalent to the statement that V(x) ~ x, which in turn, is equivalent to the statement 
that w(x) ~ x. 
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88.2 Euclid’s proof of the infinitude of primes 


If there was only a amount of then there would be some largest prime p. 
However p! + 1 is not divisible by any number n < p greater than one, so p! + 1 cannot be 
factored by the primes we already know, but every [integer] greater than one is divisible by 
at least one prime, so there must be some prime greater than p by which p! + 1 is divisible. 
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88.3 Mangoldt summatory function 


A number theoretic [function] used in the study of specifically it was used in 
the proof of the prime mambar theorem] 


It is defined thus: 


where A(x) is the Mangoldt function 
The Mangoldt Summatory Function is valid for all positive real] x. 


Note that we do not have to worry that the above is ambiguous, because A(x) is 
only non-zero for [natural] x. So no matter whether we take it to mean r is real, [integer] or 


natural, the result is the same because we just get a lot of zeros added to our answer. 


The prime number theorem, which |states! 


where m(x) is the prime counting function) is to the statement that: 
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ylz) ~ag 


We can also define a ”smoothing function” for the summatory function, defined as: 


p(z) = intoy(t)dt 


and then the prime number theorem is also equivalent to: 


pile) ~ 50 


which turns out to be easier to work with than the original form. 
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88.4 Mersenne numbers 


Numbers of the form 

M, = 2" —1,(n 21) 
are called Mersenne numbers after Father Marin Mersenne, a French monk who wanted to 
discover which such numbers are actually [prime] Mersenne primes have a strong |connection| 
with |perfect numbers 


The currently known Mersenne primes are n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 
127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209, 
44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269, 2976221, 3021377, 
6972593, and 13,466,917. 


It is conjectured that the of Mersenne primes with p< «is of 
e7 
log 2 


log log x 
where y is Euler’s constant 
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88.5 Thue’s lemma 


Let p be a [prime number] of the form 4k + 1 . Then there are two unique [integers] a and b 


with 0 < a < b such that p = a? + b?. Additionally, if a number p can be written in as the 
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sum of two Squares|in 2 different ways, then the number p is|composite| 
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88.6 composite number 


A composite number is a\natural number which is not {prime| and not equal to 1. That is, 


n is composite if n = ab, with a and b natural numbers both not equal to 1. 


Examples. 


1 is not composite (and also not prime), by definition. 


2 is not composite, as it is prime. 
e 15 is composite, since 15 = 3-5. 
e 93555 is composite, since 93555 = 3°-5-7-9-11. 


e 52223 is not composite, since it is prime. 
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88.7 prime 

An p is prime if it has exactly two positive The first few positive prime 
numbers are 2,3,5,7,11,.... 

A prime number is often (but not always) required to be positive. 
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88.8 prime counting function 


The prime counting function is a for any positive 
x, denoted as m(x) and gives the number of not exceeding x. It usually takes a 
positive n for an argument. The first few values of a(n) for n = 1,2,3,... are 
0,1,2,2,3,3,4, 4,4, 4, 5, 5, 6,6, 6,6, 7,7,8,8... (Sloane’s sequence A000720) ). 
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The asymptotic behavior of a(x) ~ x/Inz is given by the This 
function] is closely related with the |Chebyshev’s functions} 0(x) and w(x). 
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88.9 prime difference function 


The prime difference function is an \arithmetic function) for any positive [integer] n, denoted 
as dn and gives the [difference between two consecutive Pn and Pn41: 
dn © Pn+1 — Pn - 


For example: 


e di = p -pı =3-2=1, 

e dio = Pu — Pio = 31 — 29 = 2, 

e dioo = pioi — Pioo = 547 — 541 = 6, 

© diooo = P1001 — P1000 = 7927 — 7919 = 8, 

e dioooo = P10001 — P10000 = 104743 — 104729 = 14 and so forth. 
The first few values of dn for n = 1,2,3,...are 1,2,2, 4,2,4,2, 4,6,2,6, 4,2,4,6, 6,2,6, 4,2,... 
(Sloane’s sequence A001223). 
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88.10 prime number theorem 


Define m(x) as the number of [primes] less than or equal to x. The prime number theorem 
asserts that 
T(x)“ 
(2) log x 


as x — oo, that is, m(x) Tez tends to 1 as x increases. Here log x is the natural logarithm 


There is a sharper statement that is also known as the prime number theorem: 


x 


n(x) = liz + R(x), 
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where li is the logarithmic [integral] defined as 


i nt? dt a Le ih oll (k — 1)la LO z 
its inti = —— — dae a cane ss 
j *logt logx (log x)? (log x)* (log x)F+1 J ? 


and R(x) is the error term|whose behavior is still not fully known. From the work of Korobov 
and Vinogradov on zeroes of Riemann zeta-function it is known that 


R(x) = O {x exp(—c(0) (log ay") 
for every 0 > 3. The unproven [Riemann hypothesis) is to the statement that 


There exist a number of proofs of the prime number theorem. The original proofs by 
[4] and de la Vallée Poussin[7] called on analysis of behavior of the/Riemann zeta function! 
¢(s) near line Res = 1 to deduce the estimates for R(x). For a long time it was an open] 
problem to find an elementary proof of the prime number theorem ( “elementary” meaning 
“not involving complex analysis”). Finally Erdés and Selberg [8] [6] found such a proof. 
Nowadays there are some very short proofs of the prime number theorem (for example, see 


5). 
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88.11 prime number theorem result 


Gauss discovered that m(n) is approximately + but is also approximated by the [function]: 


Inn 
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1 
Lite) = intrada 
ng 
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88.12 proof of Thue’s Lemma 


Let p be a [prime] congruent] to 1 mod 4. By [Euler's criterionl (or by [Gauss lemma), the 
congruence] 


r? <—1 (mod p) (88.12.1) 
has a solution. By Dirichlet’s approximation theorem; there exist [integers|a and b such that 
£ 1 1 
se (88.12.2) 

p | [VP] +1 yP 


(2) tells us 
lax — bp| <4/p . 
Write u = |ax — bp|. We get 


u +a ar +a 0 (mod p) 


and 
0< u?’ +a? < 2p, 


whence u? + a? = p, as desired. 


To prove in another way, we will imitate a part of the|proof of Lagrange’s four-square theoren 


From (1), we know that the equation 
a +y? = mp (88.12.3) 


has a solution (x, y, m) with, we may assume, 1 < m < p. It is enough to show that if m > 1, 
then there exists (u, v, n) such that 1 < n < m and 


u +u = np. 
If m is even, then x and y are both even or both odd; therefore, in the identity 
x+y “ke L-Y a r? +y? 
2 2 D 


both summands are integers, and we can just take n = m/2 and conclude. 
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If m is odd, write a & x (mod m) and b & y (mod m) with |a| < m/2 and |b| < m/2. We 
get 
a+b? = nm 


for some n < m. But consider the identity 
(a? + b*) (x? + y”) = (ax + by)? + (ay — br)? . 
On the left is nmp, and on the right we see 


az +by & x° +y & 0 (mod m) 
ay — bz & ry- yx = 0 (mod m). 


Thus we can divide the equation 
Bat 2 2 
nmp = (ax + by)” + (ay — bx) 
through by m?, getting an expression for np as a sum of two squares. The proof is complete. 


Remark:The solutions of the congruence (1) are explicitly 


vet(2*)) (mod p) . 
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88.13 semiprime 


A|composite number) which is the product of two (possibly equal) {primes|is called semiprime. 


Such numbers are sometimes also called 2-almost primes. For example: 


e 1 is not a semiprime because it is not a composite number or a prime, 


2 is not a semiprime, as it is a prime, 
e 4is a semiprime, since 4 = 2- 2, 


8 is not a semiprime, since it is a product of three primes (8 = 2 - 2 - 2), 


2003 is not a semiprime, as it is a prime, 
e 2005 is a semiprime, since 2005 = 5- 401, 


2007 is not a semiprime, since it is a product of three primes (2007 = 3-3 - 223). 
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The first few semiprimes are 4,6, 9, 10, 14, 15, 21, 22, 25, 26, 33, 34, 35, 38, 39, 46, 49, 51, 55, 57, 58, 62,... 
(Sloane’s sequence A001358) ). The{Moebius function u(n) for semiprimes can be only equal 

to 0 or 1. If we form an linteger|sequence] of values of u(n) for semiprimes we get a [binary] 
sequence: 0,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,.... (Sloane’s sequence A072165 

Ji 


All the squares] of primes are also semiprimes. The first few squares of primes are then 
4,9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849, 2209, 2809, 3481, 3721, 4489, 5041,.... 
(Sloane’s sequence A001248 ). The Moebius function u(n) for the squares of primes is always 
equal to 0 as it is equal to 0 for all the squares of semiprimes. 
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88.14 sieve of Eratosthenes 


The sieve of Eratosthenes is a|simple| algorithm for generating |prime numbers between 1 and 
some arbitrary N. 


Let p = 2, which is of course known to be prime. Mark all positive multiples of p (2,4,6,8...) 
as |composite| Now let p be the smallest number not marked as composite (in this case, 3); 
it must be the next prime. Again, mark all positive multiples of p as composite. Continue 
this process while p < VN. When done, all numbers less than N that have not been marked 
as composite are prime. 


For many years, the sieve of Eratosthenes was the fastest known algorithm for generating 


primes. Today, there are faster methods, such as a 
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88.15 test for primality of Mersenne numbers 


Suppose p is an odd {prime} and define a Sequence] Ln recursively as 
Lo = 4, Dna = (LŽ — 2) mod (2? — 1). 


The number 2? — 1 is prime if an only if L,-» = 0. 


REFERENCES 


1. Donald E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, 1969. 
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Chapter 89 


11A51 — Factorization; primality 


89.1 Fermat Numbers 


Fermat Numbers. 
The n-th Fermat number is defined as: 


F, = 2?" +1. 
Fermat incorrectly conjectured that all these numbers were primes) although he had no 
proof. The first 5 Fermat numbers: 3, 5, 17,257, 65537 (corresponding to n = 0, 1, 2, 3, 4) are 
all primes (so called Fermat primes) Euler was the first to point out the falsity of Fermat’s 
conjecture by proving that 641 is a/divisor| of F. (In fact, F = 641 x 6700417). Moreover, 


no other Fermat number is known to be prime for n > 4, so now it is conjectured that 
those are all prime Fermat numbers. It is also unknown whether there are infinitely many 


[composite] Fermat numbers or not. 
One of the famous achievements of Gauss was to prove that the regular polygon of m sides 


can be constructed with ruler and compass if and only if m can be written as 
m = 22°F. Fos Fp, 
where k > 0 and the other factors are distinct primes of the form Fn. 
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89.2 Fermat compositeness test 


The Fermat compositeness test |states| that for any odd integer n > 0, if there exist an 
integer|b, between 1 and n — 1 such that b"~' 1 (mod n) then n is composite, 
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If b”-' 41 (mod n) then b is a witness to n’s compositeness. 
If b”! = 1 (mod n) then n is pseudoprime base| b. 


Fermat compositness test is a fast way to prove compositness of most numbers, but unfor- 
tunately there are composite numbers that are pseudoprime in every base. An example of 
a such number is 561. These numbers are called Carmichael numbers (see EIS sequence} 
[A002997] for a list of first few Carmichael numbers). 


| Proof of the Fermat compositeness test] Suppose n is prime, Then the [Euler phi-function| 
of n is given by 6(n) = n — 1 and by bi & 1 (mod n) for all 
integers b. We can conclude that if this is not the case, then n is not prime, so n must be 
composite. 
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89.3 Zsigmondy’s theorem 


For all positive q > 1 and n > 1, there exists a p which [divides] q” — 1 but 
doesn’t divide g™ — 1 for 0 < m < n, except when q = 2 — 1 and n = 2 or q = 2 and n = 6. 
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89.4 divisibility 


Given [integers|a and b, then we say a divides b if and only if there is some q € Z such that 
b= qa. 


There are many ways to notate this relationship: 


e alb (read “a divides b”) 
e b is divisible by a 
e ais a factor of b 


e ais a divisor of b 


The notion of divisibility can apply to other [rings] (e.g., polynomials). 
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89.5 division algorithm for integers 


Given any two |integers|a,b where b > 0, there exists a unique pair of integers q,r such that 
a=qb+rand0<r <b. qis called the quotient of a and b, and r is the remainder. 


The division algorithm is not an algorithm at all but rather a theorem. Its mamal probably 
derives from the fact that it was first proved by showing that an algorithm to calculate! the 
quotient of two integers yields this result. 


There are similar forms of the division algorithm that apply to other (for example, 
polynomials). 
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89.6 proof of division algorithm for integers 


Let a, b|integers| (b > 0). We want to express a = bq +r for some integers q,r with 0 < r < b 
and that such expression is unique. 


Consider the numbers 
...,@ — 3b,a — 2b,a — b,a,a + b,a + 2b, a + 3b,... 


From all these numbers, there has to be a smallest non negative one. Let it be r. Since 
r = a — qb for some ql! we have a = bq +r. And, if r > b then r wasn’t the smallest non- 
negative number on the list, since the previous (equal to r — b) would also be non-negative. 
Thus 0 <r <b. 


So far, we have proved that we can express a asbq +r for some pair of integers q,r such that 
0<r<b. Now we will prove the uniqueness of such expression. 


Let q' and r’ another pair of integers holding a = bq’ + r’ and 0 < r’ < b. Suppose r 41’. 
Since r’ = a — bq’ is a number on the list, cannot be smaller or equal than r and thus r < 1’. 
Notice that 

0 <r -r= (a— bg) — (a — bg) = bla - q') 


so bldivides|r’— r which is impossible since 0 < r'—r < b. We conclude that r’ = r. Finally, 
ifr =r’ then a—bq = a—bd’ and therefore q = q’. This concludes the proof of the uniqueness 
part. 


Version: 1 Owner: drini Author(s): drini 


1 For example, if r = a + 5b then q = —5. 
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89.7 square-free number 


A square-free number is a natural number that contains, no greater than 1 in its 
prime factorization. In other words, if x is our number, and 


r 
= Qi 
t= Pi 
i=1 


is the prime factorization of x into r distinct primes, then a; > 2 is always false for square-free 
z. 


The mame) derives from the fact that if any a; were to be greater than or equal to two, we 
could be sure that at least one square) divides] the x (namely, p?.) 


The [asymptotic density of square-free numbers is - which can be proved by application of 
a square-free variation of the las follows: 
A(n) = Ini is a squarefree] 
k<n 


SS ud) 


k<n d2\k 


= X w@So1 


dvn = 
n 


“Eels 
dfn 
— Jn) 


It was shown that [Riemann hypothesis implies ae 2+) in the above. 
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89.8 squarefull number 


A {natural number! n is called squarefull (or powerful) if for every [prime] p|n we have p?|n. In 
1978 Erdos conjectured that we cannot have three consecutive squarefull natural numbers. 
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If we assume the there are only finitely many such consecutive triples. 
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89.9 the prime power dividing a factorial 


In 1808, Legendre showed that the exact [power] of a|prime| p dividing n! is 


K 

> |i 

i= LP ; 
where K is the largest power of p being < n. 


I f p >n then p doesn’t |dividel n!, and its power is 0, and the sum above is empty. So let 
the prime p < n. 


For each 1 < i < K, there are Ea — | sr | numbers between 1 and n with ¿i being the 
greatest power of p dividing each. So the power of p dividing n! is 


S (laj |Z). 


But each Fa ,2 > 2 in the sum appears with factors 7 and 2 — 1, so the above sum equals 
Ser = 
p’ 


Corollary 1. 


[Fe] - Soe 


k=1 a 
where 5, denotes the sums of digits [function] in [basd p. 


I fn <p, then ôp (n) =n and [zl is 0 = zaw, So we assume p S n. 
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Let ngng: -no be the p-adic of n. Then 


moet) = him (Dict) 


p-l 
a nee? Mh 
= (m + mpt...tnxp*) 
+ (nə +nap +... + nxp*~*) 


+NK 
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Chapter 90 


11A55 — Continued fractions 


90.1 Stern-Brocot tree 


If we start with representing zero and infinity, 


0 1 
1 ? 0 ? 
and then between = and m we insert fraction mimi then we obtain 
0 1 1 
21°00 
Repeating the process, we get 
0 1 
1? alatas 0’ 


and then 


T’ a 2’ 2’ T’ 2’ T’ T 0’ 
and so forth. It can be proven that every irreducible fraction appears at some [iteration] 
[O]. The process can be represented graphically by means of so-called Stern-Brocot tree, 
named after its discoverers, Moris Stern and Achille Brocot. 
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0 1 i 
we | 
0 íi i 2 l 
1 Jn i 0 
0 1 1 2 1 3 2 3 1 
1 3 2 3 2 1 1 0 
0 i 1 2 132 3 1 4 352 534i 
1 4 3 5 2 5 3 4 3 2 3 1 2 1 1 0 


SS Oe ae Oe be oe oe. 


If we specify position of a fraction in the[treelas a consisting of L(eft) an R(ight) moves 
along the tree starting from the top (fraction +), and also define [matrices] 


11 10 
sajale aa 


then product of the matrices corresponding to the path is matrix E a | whose entries are 
inumerators] and [denominators] of [parent] fractions. For example, the path leading to fraction 
3 is LRL. The corresponding matrix product} is 


1 1} ]1 OJI 1 2 3 
treo allt alo af= ft a 
and the parents of 2 are i and Z, 


REFERENCES 


1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. Addison- 
Wesley, 1998. Zbl 0836.00001 
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90.2 continued fraction 


Let (@n)n>1 be apequencejof positive real numbers\ and let aj be any real number. Consider 


the sequence 


Cı = Aj + — 
ay 
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1 
Co = Qo + T 
E 
1 
Pei 
ata 
C4 =... 


The llimit|c of this sequence, if it exists, is called the value or limit of the|infinite| continued 
fraction with [convergents] (c„), and is denoted by 


4 1 
ao I 
ay + at} 
or by 
1 1 1 


+ bite 
aı+ a2+ a3+ 
In the same way, a [finite] sequence 
(Qn )i<n<k 


defines a finite sequence 
(Cr Ji<n<k : 


We then speak of a finite continued fraction with value cx. 
An archaic word for a continued fraction is anthyphairetic ratio. 


If the \denominators| an are all (positive) we speak of a simple continued fraction. 


We then use the notation q = (ao; a1, a2, @3,...) or, in the finite case, q = (do; a1, G2, Q3,- . - , An) - 


It is not hard to prove that any [irrational number] c is the value of a unique infinite simple 
continued fraction. Moreover, if c, denotes its nth convergent, then c — c, is an alternating 
sequence and |c — c,| is decreasing (as well as convergent to zero). Also, the value of an 
infinite simple continued fraction is perforce irrational. 


Any [rational number] is the value of two and only two finite continued fractions; in one of 
them, the last denominator is 1. E.g. 


43 
— = (1;2,3,4) = (1;2,3,3,1). 
30 (1; 2, 3, 4) (1; 2,3, 3, 1) 


These two conditions on a real number c are 
1. cis alrooti of an 1. 1. quadratic polynomial 1. with integer coefficients. 


2. c is irrational and its simple continued fraction is “eventually periodic”; i.e. 


c = (ao; G1, Q2,.--) 
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and, for some integer m and some integer k > 0, we have a, = an+k for all n > m. 


For example, consider the quadratic equation for the golden ratio 


r =xr+1 
or equivalenty 
1 
r=1+—. 
x 
We get 
1 
g = TFE 
a ie 
= T 
a 


and so on. If x > 0, we therefore expect 
¢ = (1;1,1,1,...) 
which indeed can be proved. As an exercise, you might like to look for a continued fraction 


expansion of the other solution of x? = x + 1. 


Although e is there is a surprising pattern in its simple continued fraction 
expansion. 
e= (2:1,2,1,1,4,1,1,6,1,1,8,1,1,10,...) 


No pattern is apparent in the expansions some other well-known transcendental 
such as 7 and the Euler-Mascheroni constant) y. 
Owing to a kinship with the Euclidean division algorithm, continued fractions arise naturally 
in number An interesting example is the Pell |Diophantine equation 

r? — Dy =1 


where D is a nonsquare integer > 0. It turns out that if (x,y) is any solution of the Pell 
equation other than (+1,0), then |2/y| is a convergent to VD. 


= and 35 are well-known rational) approximations to 7, and indeed both are convergents to 
T: 
3.14159265... = m= (3;7,15,1,292,...) 
22 
14285714... = a (3.3; 7) 
355 
14159292... = a (3; 7,15, 1) = (3; 7,16) 


For one more example, the distribution of leap years in the 4800-month cycle of the Gregorian 
calendar can be interpreted (loosely speaking) in terms of the continued fraction expansion 
of the number of days in a solar year. 
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Chapter 91 


11A63 — Radix representation; digital 
problems 


91.1 Kummer’s theorem 


Given lintegers|n > m > 0 anda p, then the [power] of p dividing (2) is equal 
to the number of carries when adding n and n — m in [basel p. 


Proof: 
For the proof we can allow base p of numbers with leading zeros. So let 
NaNad-1 No SEN, 
MaMa-1 7 Mo = M, 


all in base p. We set r = n — m and denote the p-adic representation of r with rgrg_1--- fo. 


We define c_; = 0, and for each 0< j < d 


_ (1 for mi +r; >p 
G= ( 0 otherwise. (91.1.1) 


Finally, we introduce—as in the corollary in the entry on the prime power dividing a given factoria 


ôp(n) as the sum of digits in the p-adic representation of n. Then it follows that the power 
of p dividing (2) is 
dp(m) + dp(r) — dp(n) 
p—1 
For each j > 0, we have 
Nj = Mij + Tj + Cj—-1 — p.C;j. 
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5p(m) + 5p(7) — p(n) = Jofo (Me + rr — n) 
= Efo ((P— Dey) + Y (g-e) = -c1 = 0. 


Hence we have 


the total number of carries. 
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91.2 corollary of Kummer’s theorem 


As shown in {Kummer’s theorem] the [power] of a prime number] p dividing (”),m >neéN, 


was the total number of carries when adding n and n — m in base|p. We’ll give an explicit 
[formula] for the carry indicator. 


Given [integers|n > m > 0 and a prime number p, let n;, m;, r; be the i-th digit of n,m, and 
r := n — m, respectively. 


Define c_, = 0, and for each integer i > 0 we define 


_ fi if m+7r;, > p, 
ci = | 9 otherwise 


Ni, Mi, and r; are the i-th digits in the p-adic of n,m, and r, respectively. 


(r=n-— m.) 


For each 7 > 0 we have 
Ni = Mi + Ti + Ci—1 — P-Ci. 


Starting with the i-th digit of n, we multiply with increasing powers of p to get 


d d d 
So np 2 (Eron + m) he y (gh eg E Pn) , 
k=i k=i 


k=i 


The last sum in the above equation leaves only the values for indices 7 and d, and we get 
n m r 
p p’ p 
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for all ¿ > 0. 
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Chapter 92 


11A67 — Other representations 


92.1 Sierpinski Erdos egyptian fraction conjecture 


Erdos and Sierpinski conjectured that for any n > 3 there exist positive integers 


a,b,c so that: 
5 1 1 1 
n a b cœ 
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92.2 adjacent fraction 


Two fractions + and $, ¢ > 4 of the positive [integers a, b, c, d are [adjacent if their difference 
is some unit fraction L, n > 0 that is, if we can write: 


For example the two proper fractions and unit fractions = and + are adjacent since: 


1 1 1 

11 12 192 
+ and a are not since: 

1 1 2 

17 19 323° 
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It is not necessary of course that fractions are both proper fractions: 


20 19 #1 

19 19 19 
or unit fractions: 

3 2 1 

4 3 12` 


All successive ltermsl of some F, of aldegree|n are always adjacent fractions. 
In the first Farey sequence F; of a degree 1 there are only two adjacent fractions, namely + 
and 2. 

I 


Adjacent unit fractions can be |parts of many Egyptian fractions 


11 O 141 
70 71 4970° 
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92.3 any rational number is a sum of unit fractions 


Representation 


Any ‘rational number| ¢ € Q between 0 and 1 can be represented as a sum of different 
This result was known to the Egyptians, whose way for representing rational 
numbers was as a sum of different unit fractions. 


The following greedy algorithm can |represent| any 0 < $ < 1 as such a sum: 


1. Let 


be the smallest natural number! for which 4 < $. Ifa = 0, terminate. 
2. Output + as the next [term] of the sum. 


3. Continue from step 1, setting 
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Proof of correctness 


The algorithm can never output the same unit fraction twice. Indeed, any n selected in step 
1 is at least 2, so tL < 2 —so the same n cannot be selected twice by the algorithm, as 
then n — 1 could have been selected instead of n. 


It remains to prove that the algorithm terminates. We do this by [induction] on a. 


For a=0: The algorithm terminates immediately. 
For a> 0: The n selected in step 1 
b<an<bta. 


So 


b n tm’ 


and 0 < an — b < a~— by the induction hypothesis, the algorithm terminates for + — E, 


Problems 


1. The greedy algorithm always works, but it tends to produce unnecessarily large|\denominators, 


For instance, 


47 1,1,1 
60 3 4 5 


but the greedy algorithm selects Z, leading to the 


2. The representation is never unique. For instance, for any n we have the representations 


1 1 1 
n n+1 n-(n+1) 


So given any one representation of ¢ as a sum of different unit fractions we can take 
the largest denominator appearing n and replace it with two (larger) denominators. 
Continuing the process indefinitely, we see infinitely many such representations, always. 
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92.4 conjecture on fractions with odd denominators 


Egyptian fractions raise many [open] problems; this is one of the most famous of them. 


Suppose we wish to write fractions as sums of distinct unit fractions with odd denominators 
Obviously, every such sum will have a reduced representation) with an odd denominator. 


1 


nn 2 
gg, but we may also write = as 


For instance, the greedy algorithm applied to 2 gives + + 
ey eee ae l 
7 T3 T35 35 
It is known that we can we|represent|every with odd denominator as a sum 


of distinct unit fractions with odd denominators. 


However it is not known whether the greedy algorithm) works when limited to odd denomi- 


nators. 


Conjecture 1. For any fraction 0 < gS, < 1 with odd denominator, if we repeatedly 


subtract the largest unit fraction with odd denominator that is smaller than our fraction, we 
will eventually reach 0. 
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92.5 unit fraction 


An unit fraction 4 is a fraction whose |numerator|n = 1. If its integer/denominator| d > 1, 


then a fraction is also a So there is only one unit fraction which is improper, 
namely 1. 


Such fractions are known from Egyptian mathematics where we can find a lot of special 


of the numbers as a sum of an unit fractions, which are now called Egyptian 
fractions. From the Rhind papyrus as an example: 


2 1 1 1 


7i 401 508 710° 


Many unit fractions are in the pairs of the adjacent fractions, An unit fractions are some 


successive or non-successive terms of any |Farey sequence) F, of a degree n. For example the 


fractions + and + are but they are not the successive terms in the Farey sequence 


Fs. The fractions i and + are also adjacent and they are successive terms in the F5. 
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Chapter 93 


11A99 — Miscellaneous 


93.1 ABC conjecture 


Suppose we have three mutually A, B,C satisfying A+ B = C. Given any 
c€ > 0, the ABC conjecture is that there is alconstant) «(€) such that 


max(|A], |B|, |C']) < «(e)(rad(ABC))'** 


where rad is the|radical of an integer| This conjecture was formulated by Masser and Oesterlé 
in 1980. 


The ABC conjecture is considered one of the most important unsolved problems in number 
theory, as many results would follow directly from this conjecture. For example, Fermat’s last theorem 
could be proved (for sufficiently large lexponents) with perhaps one page worth of proof. 


Further Reading 


An interesting and elementary article on the ABC conjecture can be found at|http: //www.maa.org/mathlan 
Version: 9 Owner: KimJ Author(s): KimJ 
93.2 Suranyi theorem 


Every integer) k can be expressed as the following sum: 


k= 1+2? ++. +m 
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for some m € Z+. 
We firstly note that: 
0 = 1? +2- 3? +4 -56H77 


1=1° 
2=-7-7 FHA 
4=- -2+3 
Now it suffices to prove that if the theorem is true for k then it is also true for k + 4. 


As 
(m+1)? —(m+2)? — (m+3)? + (m+4)/=4 


it’s [simple] to finish the proof: 
if k = +1°+---+m? then 


(k+4)=+1?+---+m? + (m1)? — (m+ 2)? — (m+3)? + (m+ 4) 
and we are done. 
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93.3 irrational to an irrational power can be rational 


Let A = ga If A is alrational number) it’s finished. Otherwise, if A is anlirrational number, 
let B = AYZ. Then B = Vx = 2 is alrational, Hence an irrational number to an irrational 
[power] can be a rational number. (In fact it’s is proved thanks to Gelfond-Schneider theorem 
that A is a trancendental number and then an irrational number) 
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93.4 triangular numbers 


The triangular numbers are defined by the series! 


n 


in = 8 
i=1 
That is, the nth triangular number is simply the sum of the first n The 


first few triangular numbers are 
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1, 3,6, 10, 15,21, 28,... 


The pama triangular number comes from the fact that the summation defining t, can be 
visualized as the number of dots in 


where the number of [rows)is equal to n. 


The |closed-form| for the triangular numbers is 


w= n(n + 1) 

2 
Legend has it that a grammar-school-aged Gauss was told by his teacher to sum up all the 
numbers from 1 to 100. He reasoned that each number 7 could be paired up with 101 — i, to 
form a sum of 101, and if this was done 100 times, it would result in twice the actual sum 
(since each number would get used twice due to the [pairing). Hence, the sum would be 


100(101 
L+2434---+ 109 A 


The same line of reasoning works to give us the closed form for any n. 


Another way to derive the closed form is to assume that the nth triangular number is less 
than or equal to the nth pquarg] (that is, each row is less than or equal to n, so the sum of all 
rows must be less than or equal to n-n or n”), and then use the first few triangular numbers 


to solve the general 2nd An? + Bn + C for A, B, and C. This leads to 
A = 1/2, B = 1/2, and C = 0, which is the same as the above [formula] for t(n). 
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Chapter 94 


11B05 — Density, gaps, topology 


94.1 Cauchy-Davenport theorem 


If A and B are non-empty [subsets] of Zp, then 
|A + B| > min(|A| + |B| —1,p), 


where A+ B denotes the |{sumset] of A and B. 
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94.2 Mann’s theorem 


Let A and B be|subsets| of Z. If 0€ ANB, 
o(A+ B) >min(1,coA+oB), 


where o denotes Schnirelmann density 


This statement was known also as (a + 8)-conjecture until H. B. Mann proved it in 1942. 
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94.3 Schnirelmann density 


Let A be a\subset) of Z, and let A(n) be number of elements of A in [1,n]. Schnirelmann 
density of A is 
A(n) 


oA = inf, ——. 
n 
Schnirleman density has the following 


1. A(n) > noA for all n. 
2. cA = 1 if and only if NC A 


3. if 1 does not belong to A, then cA = 0. 


Schnirelman proved that if 0 € Af) B then 
o(A+ B) >0A+oB-—oA-oB 


and also if cA+oB > 1, then o(A + B) = 1. From these he deduced that if cA > 0 then 
A is an additive basis) 
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94.4 Sidon set 


A set of is called a Sidon set if all pairwise sums of its elements are 
distinct. Equivalently, the equation a +b = c+ d has only the trivial solution {a,b} = {c,d} 
in elements of the set. 


Sidon sets are a special case of so-called B,[g] sets. A set A is called a B),[g] set if for every 
n € N the equation n = aj +... + ap has at most g different solutions with aj <S --- < a, 
being elements of A. The Sidon sets are B2[1] sets. 


Define F),(n,g) as the |size| of the largest Ba[g] set contained in the [interval] [1, n]. Whereas 
it is known that F(n, 1) = n! + O(n5/!6) P] p. 85], no asymptotical results are known for 
g>lorh>2 [[. 


The Bg] sets are understood even worse. Erdés [2] p. 89] proved that for every 
infinite Sidon set A we have lim inf, ..0(n/logn)~/? |Af[1, n]| < C for some [constant] C. 
On the other hand, for a long time no example of a set for which |A()[1,n]| > ntt: for 
some € > 0 was known. Only recently Ruzsa[3] used an extremely clever construction to 
prove the existence of a set A for which |A()[1,n]| > n¥2~'-© for every e > 0 and for all 
sufficiently large n. 
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94.5 asymptotic density 


Let A be alsubset] of Zt. For any n € Z* put A(n) = {1,2,...,n} (A. 


Define the upper asymptotic density d(A) of A by 


d(A) = lim sup ol 


n— Co 


d(A) is also known simply as the upper density of A. 
Similarly, we define d(A), the lower asymptotic density of A, by 


d(A) = lim inf lAn) 


n= n 
We say A has asymptotic density d(A) if d(A) = d(A), in which case we put d(A) = d( A). 
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94.6 discrete space 


Definition Let X be a set. Then the discrete topology for X is the topology given by the 
[power set)of X. A topological space equipped with the discrete topology is called a discrete 
space. 


In other words, the discrete topology is the finest topology one can give to a set. 


Theorem The following conditions are 


1. X is a discrete space. 
2. Every in X is an 
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3. If A is a/subset|of X, and x € A, then A is a|neighborhood| of x. 


Definition Suppose X is a topological space and Y is a subset equipped with the subspace topology 
If Y is a discrete space, then Y is called discrete subspace of X. 


Theorem Suppose X is a topological space and Y is a subset of X. Then Y is a discrete 
subspace if and only if for any y € Y, there is an open subset S of X such that SQY = {x}. 


Example The set Z is a discrete subspace of R and C. 
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94.7 essential component 


If A is a set of nonnegative integers) such that 
o(A+B)>oB (94.7.1) 


for every set B with |Schnirelmann density|0 < oB < 1, then A is an essential component. 
Erdés proved that every {basis|is an essential component. In fact he proved that 


1 
o(A+B)2>o0B+ a —oB)oB, 
where h denotes the order] of A. 


Plinnecke improved that to 
o(A + B) SoBe 


There are non-basic essential components. Linnik constructed non-basic essential component 
for which A(n) = O(n‘) for every e > 0. 
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94.8 normal order 


Let f(n) and F(n) be [functions from Zt — R. We say that f(n) has normal order F (n) if 
for each € > 0 the set 


Ale) = {n E€ Zt: (1—-©6)F(n) < f(n) < (1+ 6)F(n)} 
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has the |property] that d(A(e)) = 1. Equivalently, if B(e) = Z*\A(e), then d(B(e)) = 0. 
(Note that d(X) denotes the [lower asymptotic density| of X). 


We say that f has average order F if 


210) ~ X oF) 


j=1 
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Chapter 95 


11B13 — Additive bases 


95.1 Erdos-Turan conjecture 


Erdés-Turan conjecture asserts there exist no asymptotic basis| A C No of order! 2 such that 
its representation] function] 
r'4.9(n) = a 1 


a,+a2=n 
aı a2 


is [bounded] 


Alternatively, the question can be phrased as whether there exists a power series) F with 
coefficients 0 and 1 such that all coefficients of F? are greater than 0, but are bounded. 


If we replace set of nonnegative by the set of all integers, then the question was 
settled by Nathanson{2] in negative, that is, there exists a set A C Z such that r% (n) = 1. 
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95.2 additive basis 


A subset) A of Z is an (additive) basis of order n if 


nA =N{ J{0}, 


where nA is n-fold|sumset]of A. Usually it is assumed that 0 belongs to A when saying that 
A is an additive basis. 
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95.3 asymptotic basis 


A A of Z is an asymptotic basis of order n if the n-fold nA all 
sufficiently large [integers] 
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95.4 base con version 


The [bases] entry gives a good overview over the symbolic of numbers, some- 
thing which we use every day. This entry will give a|simple|overview and method of converting 
numbers between bases: that is, taking one representation of a number and converting it to 
another. 


Perhaps the simplest way to explain base conversion is to describe the conversion to and 
from base 10 (in which everyone is accustomed to performing arithmetic). We will begin 
with the easier method, which is the conversion from some other base to base 10. 


Conversion to Base 10 


Suppose we have a number represented in base b. This number is given as a [Sequence] of 


symbols Sn5,_1 +++ §281.ttg-++tm. This sequence represents 


Snb! + Snb? +--+ + sob + s1 +10 + tab? +--+ + tmb ™ 


This is straight-forward enough. All we need to do is convert each symbol s to its decimal 
Typically this is simple. The symbols 0, 1,--- , 9 usually represent the same value 
in any other base. For b > 9, the letters of the|alphabet|begin to be used, with a = 10,b = 11, 
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and so on. This serves up to b = 36. Since the most common bases used are (b = 2), 
octal (b = 8), decimal (b = 10), and hexadecimal (b = 16), this/schemelis generally sufficient. 


Once we symbols to values, we can easily apply the [formulal above, adding and multi- 
plying as we go along. 


Example of Conversion to Base 10 


1. Binary to Decimal Let b = 2. Convert 10010.011 to decimal. 


140010.01k = Peo 022? $e 1 ss od se +1.27 
1 1 
iGo ae 
BAE 
18.375 


2. Ternary to Decimal Let b = 3. Convert 10210.21 to decimal. 


10210.21 = 1-3żt+0-3?+2-3+1-3+0+2-31+1-3? 
2 1 
= 1:81+2:9+1:3+7 +7 


= 102.777777... 


Note that there is no exact decimal representation of the ternary value 10210.21. 
This happens often with conversions between bases. This is why many decimal values 
(such as 0.1) cannot be represented precisely in binary floating point (generally used 
in computers for non-integral arithmetic). 


3. Hexadecimal to Decimal Let b = 16. Convert 4ad9.e3 to decimal. 


4ad9.e3 = 4-16° 410+ 16° +13-1649+414+167+3-16" 


14 3 
= 4-4096+ 10-2564 13-16+9+4+ — + — 
+ + +9+ iG T 256 
= 16384 + 2560 + 208 + 9 + zal 
— i} l ji 256 


= 19161.88671875 


Iterative Method 


Obviously base conversion can become very tedious. It would make sense to write a com- 
puter program to perform these operations. What follows is a simple algorithm that iterates 
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through the symbols of a representation, accumulating a value as it goes along. Once all the 
symbols are iterated, the result is in the accumulator. This method only works for whole 
numbers. The fractional part of a number could be converted in the same manner, but it 
would have to be a separate process, working from the end of the number rather than the 
beginning. 

Algorithm PARSE((A,n, d)) 

Input: Sequence A, where 0 < Ali] < bforl<ign,b>1 

Output: The value represented by A in base b (where A[1] is the left-most digit) v — 0 


for i — 1 to n do 
v — bxv + Afi] 


Parse<— v 
This is a simple enough method that it could even be done in one’s head. 
Example of Iterative Method 


Let b = 2. Convert 101101 to decimal. 


Remaining Digits Accumulator 


101101 0 


01101 0x2+1=1 
1101 1x2+0=2 
101 2x2+1=5 
01 5x2+1=11 
1 11 *2 +0 = 22 
22 x2 +1 = 45 


This is a bit easier than remembering that the first digit corresponds to the 25 = 32 place. 


Conversion From Base 10 


To convert a value to some base b simply consists of inverse application of the iterative 
method. As the iterative method for “parsing” a symbolic representation of a value consists 
of multiplication and addition, the iterative method for forming a symbolic representation 
of a value will consist of division and subtraction. 


Algorithm GENERATE((A,n,b)) 
Input: Array A of sufficient size to store the representation, n > 0,b>1 
Output: The representation of n in base b will be stored in array A in reverse order i — 1 


whilen > 0 do 
Ali] — n mod b (remainder of n/b) 


n<—n/b (integral division) 
~—itl1 
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Example of Conversion from Base 10 


Convert the decimal value 20000 to hexadecimal. 


Value Sequence | Symbolic Sequence 


{4, 14, 2, 0} 


Note how we obtain the symbols from the right. We get the next symbol (moving left) by 
taking the value of n mod 16. We then replace n with the whole [part] of n/16, and repeat 
until n = 0. 


This isn’t as easy to do in one’s head as the other direction, though for small bases (e.g. 
b = 2) it is feasible. For example, to convert 20000 to binary: 


n Representation 
20000 

10000 0 

5000 00 

2500 000 

1250 0000 

625 0 0000 

312 10 0000 

156 010 0000 

78 0010 0000 

39 0 0010 0000 

19 10 0010 0000 

9 110 0010 0000 

4 1110 0010 0000 

2 0 1110 0010 0000 

1 00 1110 0010 0000 

0 100 1110 0010 0000 


Of course, remembering that many digits (15) might be difficult. 


Conversion Between Similar Bases 


The digits in the previous example are grouped into sets of four both to ease readability, and 
to highlight the relationship between binary and hexadecimal. Since 24 = 16, each [group] of 
four binary digits is the representation of a hexadecimal digit in the same position of the 
sequence. 
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It is trivial to get the octal and hexadecimal representations of any number once one has the 
binary representation. For instance, since 2? = 8, the octal representation can be obtained 
by grouping the binary digits into groups of 3, and converting each group to an octal digit. 


100 111 000 100 000 = 47040 


Even base 2° = 32 could be obtained: 


10011 10001 00000 = 7h0 
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95.5 sumset 


Let Aj, Ao,..., An belsubsetsl of an \additive jgroupy G. The sumset 

Ay +Á +-+ An 
is the set of all elements of the form a, + ag + --- + an, where a; € Aj. 
In geometry|a sumset is often called Minkowski sum. 
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Chapter 96 


11B25 — Arithmetic progressions 


96.1 Behrend’s construction 


At first sight it may seem that the greedy algorithm yields the densest \subset! of {0,1,..., N} 
that is [free of arithmetic progressions] of length 3. It is not hard to show that the greedy 
algorithm yields the set of numbers that lack digit 2 in their ternary development. 
of such numbers is O(N'°83?~1), 


However, in 1946 Behrend[I] constructed much denser subsets that are free of arithmetic 
progressions. His major idea is that if we were looking for a progression-free sets in R”, then 
we could use spheres} So, consider an d-dimensional cube [1, n]? N Z° and family of spheres 
x? +a3+---+2% = t fort =1,...,dn?. Each point in the cube is contained in one of the 
spheres, and so at least one of the spheres [contains] n“/dn? llattice points. Let us call this set 
A. Since sphere does not contain arithmetic progressions, A does not contain any progressions 
either. Now let f be a|Freiman isomorphism) from A to a subset of Z defined as follows. If 
x = {X1,22,..., Xa} is a point of A, then f(x) = z1 + %2(2n) + 43(2n)? +--+ + zan)" t, 
that is, we treat x; as i’th digit of f(x) in [base] 2n. It is not hard to see that f is indeed 
a Freiman isomorphism of [order] 2, and that f(A) C {1,2,...,N = (2n)¢}. If we set 
d = cvln N, then we get that there is a progression-free subset of {1,2,..., N} of |sizel at 
least Ne~V™N(cln2+2/e+o1) To maximize this value we can set c = \/2/In2. Thus, there 
exists a progression-free set of size at least 


Ne7V8in2inN(1+0(4)) 


This result was later generalized to sets not containing arithmetic progressions of length k 
by Rankin[3]. His construction is more complicated, and depends on the estimates of the 


number of of an [integer] as a sum of many Squares} He proves that the size 


of a set free of k-term arithmetic progression is at least 


Ne cles Nyy 
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On the other hand, Moser[2] gave a construction analogous to that of Behrend, but which 


was explicit since it did not use the pigeonhole principle 
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96.2 Freiman’s theorem 


Let A be a finite] set of integers| such that the 2-fold |sumset| 2A is “small”, i.e., |2A] < c|A| 


for some [constant] c. There exists an n-dimensional arithmetic progression of length c’'|A| 
that contains A, and such that c’ and n are functions of c only. 
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96.3 Szemerédi’s theorem 


Let k be a positive [integer] and let 6 > 0. There exists a positive integer N = N (k, ô) such 
that every [subset] of {1,2,..., N} of [sizel ôN [containslan arithmetic progression! of length k. 


The case k = 3 was first proved by Roth{4]. His method did not seem to extend to the case 
k > 3. Using completely different ideas Szemerédi proved the case k = 4 [5], and the general 
case of an arbitrary k [6]. 


The best known bounds] for N (k, 5) are 


gk+9 


eles) < N(k,6) <2", 


548 


where the lower bound] is due to Behrend{]] (for k = 3) and Rankin[3], and the|upper bound) 


is due to Gowers[2]. 
For k = 3 a better upper bound was obtained by Bourgain 


N(3,6) < c6~7e?°, 
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96.4 multidimensional arithmetic progression 


An n-dimensional arithmetic! progresssion is a set of the form 


Q = Qla; qi,- -qni li,- ln) 
= {a+ tiqi +: + Engh |0 <S z; < l; fori=1,...,n}. 


The length of the progression is defined as lı ---lp. The progression is proper if |Q| = 
Locale. 
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Chapter 97 


11B34 — Representation functions 


97.1 Erdos-Fuchs theorem 


Let A be a set of {natural numbers) Let R,,(A) be the number of ways to [represent] n as a 
sum of two elements in A, that is, 
AAAS So L 


aitaj=n 
aj,aj;EA 


Erdés-Fuchs theorem [I] [2] [states] that if c > 0, then 
N R,(A) = eN +0 (Ni log72 N) 
nN 


cannot hold. 
On the other hand, Ruzsa [3] constructed a set for which 


NO Rn(A) = cN +0 (xi log N) , 


nN 
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Chapter 98 


11B37 — Recurrences 


98.1 Collatz problem 
We define the f : N — N (where N excludes zero) such that 


mos 3a+1 ifais odd 
7 a/2 if ais even. 


Then let the Sequence} cn be defined as c; = f(ci-1), with co an arbitrary natural] seed value. 


It is conjectured that the sequence co, ¢, C2,... will always end in 1, 4, 2, repeating infinitely. 
This has been verified by computer up to very large values of cp, but is unproven in general. 
It is also not known whether this problem is decideable. This is generally called the Collatz 
problem. 


The sequence cn is sometimes called the “hailstone sequence”. This is because it behaves 
analogously to a hailstone in a cloud which falls by gravity and is tossed up again repeatedly. 
The sequence similarly ends in an eternal oscillation. 
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98.2 recurrence relation 


A recurrence relation is a [function] which gives the value of a Sequence] at some position 
based on the values of the sequence at previous positions and the position itself. If 
the current position n of a sequence s is denoted by sn, then the next value of the sequence 
expressed as a recurrence relation would be of the form 


dol 


Spat = 7 (Sy S2; -< , Sn—1; Sni N) 


Where f is any function. An example of a simple| recurrence relation is 


Sn41 = Sn + (n+ 1) 


which is the recurrence relation for the sum of the integers|from 1 to n + 1. This could also 
be expressed as 


Sn = Sn-1 tn 


keeping in mind that as long as we set the proper initial values of the sequence, the recurrence 
relation indices can have any {constant) amount added or subtracted. 
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Chapter 99 


11B39 — Fibonacci and Lucas numbers 
and polynomials and generalizations 


99.1 Fibonacci sequence 


The Fibonacci sequence, discovered by Leonardo Pisano Fibonacci, begins 


0,1,1,2,3,5, 8, 13, 21, 34, 55, 89, 144, 233, 377, ... 


The nth Fibonacci number is generated by adding the previous two. Thus, the Fibonacci 


sequence has the recurrence relation 
Ín = tama F fn- 


with fo = 0 and fı = 1. This recurrence relation can be solved into the [closed form] 
1 


v5 (¢” ~~ p”) 


f(n) 


Where ¢ is the golden ratio) (also see this entry for an explanation of ¢’.) Note that 


fn+1 


n 


=o 


lim 
noo 
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99.2 Hogatt’s theorem 


Hogatt’s theorem that every positive can be expressed as a sum of distinct 
Fibonacci numbers 


For any positive integer, k € Z*, there exists a unique positive integer n so that F,_1 < k < 


Fa. We proceed by strong |induction|on n. For k = 0, 1, 2,3, the {property]is true as 0,1, 2,3 
are themselves Fibonacci numbers. Suppose k > 4 and that every integer less than k is a 
sum of distinct Fibonacci numbers. Let n be the largest positive integer such that Fn < k. 
We first note that if k — Fa > F,_1 then 


Fagg È k > Fa ly = las 


giving us a contradiction. Hence k — F, < F,_; and consequently the positive integer 
(k — Fn) can be expressed as a sum of distinct Fibonnaci numbers. Moreover, this sum does 
not [contain] the term] F;, as k— Fan < Fy_1 < Fy. Hence,k = (k — Fa) + Fn is a sum of distinct 
Fibonacci numbers and Hogatt’s theorem is proved by induction. 
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99.3 Lucas numbers 


The Lucas numbers are a slight variation of These numbers follow the 
same recursion: 
Inti = ln F lat 


but having different initial conditions: lı = 1, l2 = 3 leading to thejsequence}1, 3, 4, 7, 11, 18, 29, 47, 76, 123, . 


Lucas numbers hold the following In = fn-1 + fn41 where fn is the nth Fibonacci 


number. 
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99.4 golden ratio 


The ” Golden Ratio”, or ¢, has the value 


1.61803398874989484820. .. 


@ gets its rather illustrious mame) from the fact that the Greeks thought that a 
with ratio of side lengths of about 1.6 was the most pleasing to the eye. Classical Greek 
architecture is based on this premise. 
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Above: The golden rectangle; l/w = ¢. 
@ has plenty of interesting mathematical properties, however. Its value is exactly 


bay 


The value 


is often called ¢’. @ and ¢’ are the two of the given by the 
The following [identities hold for ¢ and ¢’ : 


e 35-0 
.1-0=9 
e 3 =-ọ 
e 1-9 =ġ 


and so on. These give us 
o7! + g? -_ gi 
which implies 


grt at. gn = geet 
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Chapter 100 


11B50 — Sequences (mod ™) 


100.1 Erdos-Ginzburg-Ziv theorem 


If ay, a2, ...,Qon—1 is a set of [integers] then there exists a subset a,,, liz,- - -, Qi, Of n integers 
such that 
Qi F Qis Heee + og, @ 0 (mod n). 
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Chapter 101 


11B57 — Farey sequences; the 
sequences ? 


101.1 Farey sequence 


The n’th Farey sequence is the ascending /sequence] of all [rationals] {0 < Saleh = i}. 


The first 5 Farey sequences are 


[RB [RoR IRF] 
IS | RCo J ED J AR | 


T 
0 
T 
0 
T 
0 
T 
0 
1 


Farey sequences are a singularly useful tool in understanding the convergents| that appear in 
continued fractions) The convergents for any irrational] a can be found: they are precisely 
the closest number to œ on the sequences Fn. 


It is also of value to look at the sequences Fn as n grows. If ¢ and § are reduced 


of in some Farey sequence F, (where b, d < n), then they are adjacent fractions; 
their difference is the least possible: 


a c 1 


b dl bd’ 


Furthermore, the first to appear between the two in a Farey sequence is #5, in 
sequence Fy,4, and (as written here) this fraction is already reduced. 


An alternate view of the “dynamics” of how Farey sequences develop is given by Stern-Brocot 
trees. 
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Chapter 102 


11B65 — Binomial coefficients; 
factorials; g-identities 


102.1 Lucas’s Theorem 


Let m,n € N — {0} be two {natural numbers . If p is a [prime number and : 
it k-1 Zonk k-1 
m = akp? + app" +---+ayp+ao,n = bgp? + be_ip” +... + bip + bo 


are the base-p expansions of m and n , then the following jcongruence)is true : 


(2) » (2) (2) Gow» 


Note : the binomial coefficient) is defined in the usual way , namely : 


pa 


if x > y and 0 otherwise (of course , x and y are natural numbers). 
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102.2 binomial theorem 


The binomial theorem is a[formulalfor the expansion of (a +b)”, for n a positive [integer] and 
a and b any two [reall (or complex) numbers, into a sum of of a and b. More precisely, 


(a+b) =a” + (") a” b+ (5) ab? +... +". 


559 


For example, if n is 3 or 4, we have: 


a +b)’ = a? + 3a7b + 3ab? + b? 


(a+b)* = af + 4a%b + 6a?b? + 4ab? + b+. 
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Chapter 103 


11 B68 — Bernoulli and Euler numbers 
and polynomials 


103.1 Bernoulli number 


Let B, be the rth Bernoulli periodic function, Then the rth Bernoulli number is 


One can see that Bər+1 = 0 for r > 1. Numerically, Bp = 1, By = —$, By = E, Ba = -4, ee 
Version: 4 Owner: KimJ Author(s): KimJ 

103.2 Bernoulli periodic function 

Let b, be the rth Bernoulli polynomial, Then the rth Bernoulli periodic function B,(x) is 


defined as the periodic function) of period) 1 which coincides with b, on [0, 1]. 
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103.3 Bernoulli polynomial 


The Bernoulli polynomials are the sequence) {b,(7)}°2 of polynomials defined on [0,1] by 


the conditions: 


bo(x) = 1, 
b (x) = rbpalx),r 2 1, 
intġb,(x)dr = 0,r>1 


These assumptions imply the identity 
_ r ery 
Dooa 
r=0 


r! e-l 


allowing us to calculate the b,. We have 


bolz) = ] 
1 
bi(z) = xz— 5 
1 
bo(z) = 2? -—a2+4+ 6 
3 1 
bs(z) = 2° — 5 + 32 
1 
balz) = a — 22° +r’ -— a0 
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103.4 generalized Bernoulli number 


Let x be a non-trivial primitive character mod m. The generalized Bernoulli numbers B,,, 


are given by 


a=1 n=0 


They are members of the |field| Q(x) generated by the values of x. 
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Chapter 104 


11B75 — Other combinatorial number 
theory 


104.1 Erdos-Heilbronn conjecture 


Let A C Z, be a set of residues) modulo p, and let h be a positive [integer] then 
h^A = {a, + a9 +--+ an | Q1,2,...,@, are distinct elements of A } 


has at least min(p, hk — h? +1). This was conjectured by Erdés and Heilbronn 
in 1964[I]. The first proof was given by Dias da Silva and Hamidoune in 1994. 
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104.2 Freiman isomorphism 


Let A and B be subsets) of labelian groups|G 4 and G'g respectively. A Freiman isomorphism 
of order] s is a{bijective|mapping f : A — B such that 
/ 


aı +a +--+ +a, = +a tta 
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holds if and only if 


flai) + flag) +--+ + flas) = flay) + flag) +--+ + fla). 


The Freiman isomorphism is a restriction of the conventional notion of a/group|isomorphism 
to a limited number of group operations. In particular, a Freiman isomorphism of order s is 


also a Freiman isomorphism of order s — 1, and the mapping is a Freiman isomorphism of 
every order precisely when it is the conventional isomorphism. 


Freiman isomorphisms were introduced by Freiman in his monograph [I] to build a general 


[theory] of [set addition! that is [independent] of the underlying group. 
The number of under Freiman isomorphisms of order 2 is n?”@+°0)) [9]. 
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104.3 sum-free 


A set A is called sum-free if the equation a, + a2 = a3 does not have solutions in elements 
of A. Equivalently, a set is sum-free if it is [disjoint] from its 2-fold|sumset} i.e., AN2A = 0. 
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Chapter 105 


11B83 — Special sequences and 
polynomials 


105.1 Beatty sequence 


jcc 


is called the Beatty sequence with density a, slope Ł, offset a’, and y-intercept =£, 


The 


Sometimes a sequence of the above type is called a Beatty sequence, and denoted 
BW) (a, a’), while an integer sequence 


is called a [ceiling] Beatty sequence. 
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105.2 Beatty’s theorem 


If p and q are positive \irrationals| such that 


then the sequences] 


where |x| denotes the (or greatest integer [function) of x, constitute a of the 
set of positive 


That is, every positive integer is a member exactly once of one of the two sequences and the 
two sequences have no common {terms 
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105.3 Fraenkel’s partition theorem 


Fraenkel’s partition theorem is a generalization of Beatty’s theorem, Set 


nor (EED. 


We say that two Sequences|partition| N = {1,2,3,...} if the sequences are (disjoint| and their 
union is N. 


Fraenkel’s Partition Theorem: The sequences B(a,a’) and B(G, 3’) partition N if and 
only if the following five conditions are satisfied. 
1. 0<a<l. 
.at+fP=l. 
0<a+a' <1. 


. Ifa islirrational, then a' + B'=0 and ka +a! Z Z for2<keEN. 


a >A ww S 


. Ifa islrational (say q € N is minimal with qa € N), then 7 < a+a' and |qa' |+ [q8] = 
1. 
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105.4 Sierpinski numbers 


An linteger] k is a Sierpinski number if for every positive integer n, the number k2” + 1 is 


That such numbers exist is amazing, and even more surprising is that there are infinitely 
many of them (in fact, infinitely many odd ones). The smallest known Sierpinski number 
is 78557, but it is not known whether or not this is the smallest one. The smallest number 
m for which it is unknown whether or not m is a Sierpinski number is 4847. 


A process for generating Sierpinski numbers using covering sets of can be found at 
Visit 


for the distributed computing effort to show that 78557 is indeed the smallest Sierpinski 
number (or find a smaller one). 


Similarly, a Riesel number is a number k such that for every positive integer n, the number 
k2” —1 is composite. The smallest known Riesel number is 509203, but again, it is not known 
for sure that this is the smallest. 
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105.5 palindrome 


A palindrome is a number which yields itself when its digits are reversed. Some palindromes 
are : 


e 121 
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e 2002 

e 314159951413 
‘Clearly one can construct aribitrary-length palindromes by taking any number and append- 
ing to it a reversed copy of itself or of all but the last digit. 


The concept of palindromes can also be extended to Sequences] and strings 


Version: 1 Owner: akrowne Author(s): akrowne 


105.6 proof of Beatty’s theorem 


We define a, := np and bn := ng. Since p and q are irrational) so are ay, and bp. 


It is also the case that a, # bm for all m and n, for if np = mq then q = 1+ 4 would be 


The theorem is equivalent) with the statement that for each [integer] N > 1 exactly 1 element 
of {an} {bn} lies in (N, N +1). 


Choose N integer. Let s(N) be the number of elements of {an} {bn} less than N. 


N 
an < N &np< N &n< — 
P 


So there are [= elements of {an} less than N and likewise IŽ] elements of {b, }. 


By definition, 


ls ee IZ] 


A < 
tai < £] < 


9/23 /2 


and summing these [inequalities] gives N — 2 < s(N) < N which gives that s(N) = N-1 
since s(N) is integer. 


The number of elements of {a,}U{b,} lying in (N, N + 1) is then s(N + 1) — s(N) =1. 


Version: 4 Owner: lieven Author(s): lieven 


568 


105.7 square-free sequence 


A square-free sequence is a sequence] which has no repeating [subsequences] of any 
length. 


The “square-free” comes from notation: Let {s} be a sequence. Then {s,s} is also a 
sequence, which we write “compactly” as {s*}. In the rest of this entry we use a 
notation, lacking commas or braces. This notation is commonly used when dealing with 
sequences in the capacity of Hence we can write {s,s} = ss = 3°. 


Some examples: 


e xabcabcx = x(abc)?x, not a square-free sequence. 


e abcdabc cannot have any subsequence written in square notation, hence it is a square- 
free sequence. 


e ababab = (ab)? = ab(ab)?, not a square-free sequence. 


Note that, while notationally similar to the number-theoretic sense of “square-free,” the two 
concepts are distinct. For example, for integers|a and b the product aba = ab, a 


But as a sequence, aba = {a,b, a}; lacking any commutativity that might allow us to 
shift elements. Hence, the sequence aba is |square-free, 
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105.8 superincreasing sequence 


A sequence] sı, so,... is superincreasing if 


n—1 
Sn > ` Si 
i=1 
That is, any element of the sequence is greater than all of the previous elements added 


together. A commonly used superincreasing sequence is that of of two (s; = 2°.) 


Suppose x = )~"_, a;s;. If s is a superincreasing sequence and every a; € {0,1}, then we can 
always determine the a;’s simply by knowing x (this is analogous to the fact that we can 
always determine which bits are on and off in the [binary] bitstring representing a number, 
given that number.) 
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Chapter 106 


11B99 — Miscellaneous 


106.1 Lychrel number 


A Lychrel number is a number which never yields a in the iterative process 
of adding to itself a copy of itself with digits reversed. For example, if we start with the 
number 983 we get: 


e 983 + 389 = 1372 
e 1372 + 2731 = 4103 


e 4103 + 3014 = 7117 


So in 3 steps we get a palindrome, hence 983 is not a Lychrel number. 


In fact, it is not known if there exist any Lychrel numbers (in [base]10- in base 2 for instance, 
there have been numbers proven to be Lychrel numbers H). The first Lychrel candidate is 
196: 


196 + 691 = 887 


887 + 788 = 1675 


e 1675 + 5761 = 7436 


e 7436 + 6347 = 13783 


13783 + 38731 = 52514 


1P] informs us that Ronald Sprague has proved that the number 10110 in base 2 is a Lychrel number. 
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52514 + 41525 = 94039 


94039 + 93049 = 187088 


e 187088 + 880781 = 1067869 


This has been followed out to millions of digits, with no palindrome found in the Sequence} 


The following table gives the number of Lychrel candidates found within ascending ranges: 


Range Possible Lychrels 
0 - 100 0 
100 - 1,000 2 
1,000 - 10,000 3 
10,000 - 100,000 69 
100,000 - 1,000,000 99 
10,000,000 - 100,000,000 1728 
100,000,000 - 1,000,000,000 29,813 
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106.2 closed form 


A closed form [function which gives the value of a Sequence at index! n has only one param- 
eter, n itself. This is in contrast to the recurrence relation form, which can have all of the 


previous values of the sequence as parameters. 


The benefit of the closed form is that one does not have to/calculate all of the previous values 
of the sequence to get the next value. This is not too useful if one wants to print out or 
utilize all of the values of a sequence up to some n, but it is very useful to get the value of 
the sequence just at some index n. 


There are many techniques used to find a closed-form solution for a recurrence relation. 
Some are 
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e Repeated substitution. Replace each są in the expression of s, (with k < n) with 
its recurrence relation Repeat again on the resulting expression, until 


some pattern is evident. 


e Estimate an|upper bound for s,, in{terms|of n. Then, solve for the unknowns (say there 
are r unknowns) by finding the first r values of the recurrence relation and solving the 
linear system formed by them and the unknowns. 


e Find the characteristic equation of the recurrence relation and solve for the If 
the recurrence relation is not then you’ll have to apply a method such 
as the 
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Chapter 107 


11C08 — Polynomials 


107.1 content of a polynomial 


Let P = ap + a£ +... + anz” € Z[x] be a polynomial with [integer] coefficients. The content 
of P is the of the coefficients of P. 


c(P) = gcd(ao, Qj,-++, an) 


Version: 1 Owner: Daume Author(s): Daume 


107.2 cyclotomic polynomial 


For any positive integer] n, we define ®,,(x), the nth cyclotomic polynomial, by 


~ vt — Cn’) 
where Cn = e, i.e. Çn is an nth root of unity) 
©, (2) is an xeduciblelpolynomial] of fegree| 4(n) in Q(z] for all n € Zt. 


Version: 3 Owner: saforres Author(s): saforres 


573 


107.3 height of a polynomial 


Let P = ap + aix +... + anz” € C[z] be a polynomial with [complex] coefficients. The height 
of P is 


H(P) = max{| ao |,| a1 |,---,| an |}. 
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107.4 length of a polynomial 


Let P = ao +02 +... + anx” € C[zx] be a[polynomial] with [complex] coefficients. The length 
of P is 


n 


L(P) =| ao | +| |+.-.+ |an |=) |a| 


i=0 
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107.5 proof of Eisenstein criterion 


Let f(x) € R[x] be a{polynomial] satisfying with p. 


Suppose that f(x) = g(x)h(x) with g(x), h(x) € F[z], where F is the [field _of fractions of R. 
A of Gauss states that there exist g'(x), h'(x) € Rix] such that f(x) = g'(x)h'(x), i.e. 
any factorization can be converted to a factorization in R[z]. 


Let f(x) = X; o aiat, g'(x) = oa bixi, h'(x) = oy, cex® be the expansions of f(x), g'(x), 
and h'(x) respectively. 


Let y : Riz] — R/pR[z] be the from R[x] to R/pR|z]. Note that 


since p | a; fori < n and p {f an, we have y(a;) = 0 fori < n and y(a;) =a £0 


y(f(z))=¢9 (>: oe) = ` plaiz = p(an)x” = ax” 


i=0 


Therefore we have az” = y(f(x)) = y(g'(x)h'(x)) = v(g'(x))y(h'(x)) so we must have 
y(g'(z)) = Bx" and p(h'(x) = yx" for some 8,7 € R/pR and some |integers| 2, m’. 


l < deg(g'(x)) = £ and m’ < deg(h'(x)) = m, and therefore since Hm’ = n = tm, 
we must have @ = £ and m = m. Thus y(g'(x)) = Bx! and y(h'(x)) = ya™. 
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If £ > 0, then y(b;) = 0 for i < £. In particular, y(bg) = 0, hence p | bp. Similarly if m > 0, 
then p | co. 


Since f(x) = g'(x)h'(x), by equating coefficients we see that a9 = boco. 


If £ > 0 and m > 0, then p | bọ and p | co, which implies that p? | ag. But this contradicts 
our assumptions on f(x), and therefore we must have £ = 0 or m = 0, that is, we must have 


a trivial factorization. Therefore f(z) is irreducible, 
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107.6 proof that the cyclotomic polynomial is irreducible 


We first prove that ®„(x) € Z[z]. The Q(¢n) of Q is the [splitting field] of the 
polynomial] z” — 1 € Qjz], since it splits this polynomial and is generated as an [algebra] by a 
single root! of the polynomial. Since splitting fields are [normal] the extension Q(¢,)/Q is a 
Any element of the [Galois group} being a field [automorphism] must {map| 
¢, to another [root of unity) of exact /order|n. Therefore, since the Galois group of Q(¢,)/Q 
permutes the roots of (x), it must [fix] the coefficients of ®,,(x), so by [Galois theory) these 
coefficients are in Q. Moreover, since the coefficients are algebraic integers, they must be in 


Z as well. 


Let f(x) be the|minimal polynomiallof ¢, in Q[z]. Then f(x) has |{integer| coefficients as well, 


since Çn is an algebraic integer. We will prove f(x) = ®,(a) by showing that every root of 
®,,(x) is a root of f(a). We do so via the following claim: 


Claim: For any p not dividing n, and any primitive n root of unity ¢ € C, if 
KO =0 then OJT. 


This claim does the job, since we know f(¢,) = 0, and any other primitive n“ root of unity 
can be obtained from ¢, by successively raising Çn by prime powers) p not dividing n a finite] 
number of times [H 


To prove this claim, consider the factorization x” —1 = f(x)g(x) for some polynomial g(x) € 
Z|x]. Writing O for the|ring of integers| of Q(¢,,), we treat the factorization as taking place in 
O|a] and proceed to mod out both sides of the factorization by any |prime ideal] p of O lying 
over (p). Note that the polynomial z” — 1 has no repeated roots mod p, since its [derivative] 
nz”! is relatively prime|to x”—1 mod p. Therefore, if f(¢) = 0 mod p, then g(¢) Æ 0 mod p, 
and applying the p™ power [Frobenius map| to both sides yields g(¢?) 4 0 mod p. This means 


that g(¢?) cannot be 0 in C, because it doesn’t even equal 0 mod p. However, C? is a root of 
x" —1, so if it is not a root of g, it must be a root of f, and so we have f(¢?) = 0, as desired. 


1 Actually, if one applies Dirichlet’s theorem on primes in arithmetic progressions) here, it turns out that 


one prime is enough, but we do not need such a sharp result here. 
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Chapter 108 


11D09 — Quadratic and bilinear 
equations 


108.1 Pell’s equation and simple continued fractions 


Let d be a positive which is not a and let (x,y) be a solution of 
x? — dy? = 1. Then * is a [convergent] in the simple continued fraction expansion of Vd. 


Suppose we have a non-trivial solution x,y of Pell’s equation, i.e. y 4 0. Let x,y both be 
positive integers. From 


2 
we see that (2) > d, so we have 7 > Vd. So we get 


1 


2 | - 


P+ P 


This implies that F is a convergent of the [continued fraction) of Va. 
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Chapter 109 


11D41 — Higher degree equations; 
Fermat’s equation 


109.1 Beal conjecture 


The Beal conjecture states: 
Let A, B,C, x,y,z be nonzero |integers| such that x, y, and z are all > 3, and 
A® + BY =C? (109.1.1) 
Then A, B, and C (or any two of them) are not 
It is clear that the famous statement known as|Fermat’s last theorem] would follow from this 


stronger claim. 
Solutions of equation (1) are not very scarce. One parametric solution is 


for m > 3, and a,b such that the terms are nonzero. But computerized searching brings 
forth quite a few additional solutions, such as: 
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3° + 6° = 3° 


39 + 54° = 3H 
36 + 18°? = 38 
7 + 77 = 983 


27* + 1623 = 9” 
211? + 3165? = 4224 
386° + 4825? = 579+ 

307? + 6144 = 52193 
54007 + 90* = 6304 
217° +5642" = 651 
271° + 813* = 7588? 
602? + 9034 = 8729° 
624° + 143523 = 312° 
1862° + 57722? = 37244 
2246° + 44924 = 74118° 
1838° + 974143 = 5514" 


Mysteriously, the summands have a common [factor > 1 in each instance. 


This conjecture is “wanted in Texas, dead or alive”. For the details, plus some additional 
links, see 
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109.2 Euler quartic conjecture 


Inspired by Fermat_last_theorem|Euler conjectured that there are no positive integer|solutions 
to the quartic equation 


rt +y +24 =u". 


This conjecture was disproved by Elkies (1988), who found an infinite class| of solutions. One 
of the first solutions discovered was 


26824404 + 153656397 + 187960* = 20615673“ 
Bibliography 
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109.3 Fermat’s last theorem 


The Theorem 


Fermat’s last theorem was put forth by Pierre de Fermat around 1630. It [states] that the 
Diophantine equation (a, b,c,n € N) 


a” +b = c” 


has no non-zero solutions for n > 2. 
History 


Fermat’s last theorem was actually a conjecture and remained unproved for over 300 years. It 
was finally proven in 1994 by Andrew Wiles, an English mathematician working at Princeton. 
It was always called a “theorem”, due to Fermat’s uncanny ability to propose true conjec- 
tures. Originally the statement was discovered by Fermat’s son Clement-Samuel among 
margin notes that Fermat had made in his copy of Diophantus’ Arithmetica. Fermat 
followed the statement of the conjecture with the infamous teaser: 


“I have discovered a truly remarkable proof which this margin is too small to contain” 


Over the years, Fermat’s last theorem was proven for various sub-cases which required specific 
values of n, but no direct progress was made along these lines towards a general proof. These 
proofs were bittersweet victories, as each one still left an [infinite number of cases unproved. 
Among the big mames) who took a crack at the theorem are Euler, Gauss, Germaine, Cauchy, 
Dirichlet, and Legendre. 


The theorem finally began to yield to direct attack in the 20th century. 
Proof 


In 1982 Gerhard Frey conjectured that if FLT has a solution (a,b,c,n), then the elliptic curve] 
defined by 


y? = x(x — a") (x +5") 


is semistable, but not modular. The above equation is known as Frey’s equation, or the Frey 
curve. Ribet proved this conjecture in 1986. 


The |Taniyama-Shimura conjecture, which appeared in an early form in 1955, says that all 
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elliptic curves are modular. If in fact this conjecture were to be proven in the semistable 
case, then it would follow that the Frey equation would be semistable and modular, hence 
FLT could have no solutions. 


After a flawed attempt in 1993, Wiles along with Richard Taylor successfully proved the 
semistable case of the Taniyama-Shimura conjecture in 1994, hence proving Fermat’s last 
theorem. The proof appears in the May 1995 Annals of Mathematics, Vol. 151, No. 3. It is 
129 pages. 


Speculation 


Wiles’ proof rests upon the work of hundreds of mathematicians and the mathematics created 
up to and including the 20th century. We cannot imagine how Fermat’s last theorem could 
be proved without these advanced mathematical tools, which include group [theory] and 


the theory of modular forms, Riemannian topology, and the theory of elliptic 


equations. 


Could Fermat, then, have possibly had a proof to his own conjecture, in the year 1630? 
It doesn’t seem likely, given the requisite mathematics behind the proof as we know it. 
Assuming Fermat’s teaser was truthful, and Fermat was not in error, this “paradox” has 
lead some to (hopefully) jokingly attribute] supernatural abilities to Fermat. 


A more interesting possibility is that there is yet another proof, which is elementary and 
utilizes no more knowledge than Fermat had available in his day. 


Most mathematicians, however, think that Fermat was just in error. It is also possible that 
he realized later that he didn’t have a solution, but of course did not ammend the margin 
notes where he wrote his tantalizing statement. Still, we cannot rule out the existence of a 
simpler proof, so for some, the search continues... 


Further Reading 


e Fermat’s Last ‘Theorem, by J J O’Connor and E F Robertson. 
e Fermat’s Last ‘Theorem, web site by David Shay. 


e (book, offline), by Simon Singh. 
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Chapter 110 


11D79 — Congruences in many 
variables 


110.1 Chinese remainder theorem 


Suppose we have a set of n (congruences) of the form 


x = a, (mod pı) 
x & a (mod py) 


zr & an (mod pn) 


where p1, p2... Pn are Let 
P = [> 
i=1 
and, for alli € N (1 <i <n), let y; be an [integer] that [satisfies] 
P 
yi— 1 (mod pj) 


t 


Then one solution of these congruences is 


Any «x € Z satisfies the set of congruences if and only if it satifies 


x zo (mod P) 
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The Chinese remainder theorem is said to have been used to count the |size| of the ancient 
Chinese armies (i.e., the soldiers would split into {groups} of 3, then 5, then 7, etc, and the 
“leftover” soldiers from each grouping would be counted). 
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110.2 Chinese remainder theorem proof 


We first prove the following lemma} if 
a<b (mod p) 


a&b (mod q) 


gcd(p,q) = 1 


then 
a&b (mod pq) 


We know that for some k € Z, a — b = kp; likewise, for some j € Z, a — b = jq, so kp = jq. 
Therefore kp — jq = 0. 


It is a well-known theorem that, given a,b,c, £o, Yo E Z such that zoa + yob = c and d = 


gcd(a, b), any solutions to the Diophantine equation) ax + by = c are given by 


£ = to + -n 


d 
a 
Y = Yo + Fil 
where n € Z. 


We apply this theorem to the diophantine equation kp — jq = 0. Clearly] one solution of this 
diophantine equation is k = 0,7 = 0. Since gcd(q,p) = 1, all solutions of this equation are 
given by k = nq and j = np for any n € Z. So we have a — b = npq; therefore pq [divides] 
a—b,so a & b (mod pq), thus completing the lemma. 


Now, to prove the we first show that y; must exist for any 
Inaturalli where 1 <i <n. If 
P 
y— S1 (mod p;) 


4 


then by definition there exists some k € Z such that 


P 
yi— —1 = kpi 
Pi 


l 


583 


which in turn implies that 
P 
yi— — kpi =1 
Pi 


This is a diophantine equation with y; and k being the unknown |integers| It is a well-known 
theorem that a diophantine equation of the form 


az + by =c 


has solutions for x and y if and only if gcd(a, b) divides c. Since 2 is the product of each p; 


(j EN, 1 <j <n) except p;, and every p; is to pi, z and p; are relatively 
prime. Therefore, by definition, gced(È, pi) = 1; since 1 divides 1, there are integers k and y; 


that the above equation. 


Consider some j € N, 1 < j < n. For any i E€ N, 1 S i < n, either i Æ j or i = j. Ifi Æj, 


then 
P P 
QiYi— = | Yi] Pj 
Pi PiPj 


so p; divides aye, and we know 


P 
aiyi— 0 (mod pj) 


Now consider the case that i = j. y; was selected so that 


P 
yj— <1 (mod pj) 


Pj 
so we know 
P 
ajyj— a; (mod p;) 
Pj 


So we have a set of n [congruences] mod p;; summing them shows that 


n 


P 
` aiyi— a; (mod pj) 


i=1 ? 
Therefore x9 satisfies all the congruences. 


Suppose we have some 
xr zo (mod P) 


This implies that for some k € Z, 
£ — zo = kP 


So, for any p;, we know that 


so x <> Xp (mod p;). Since congruence is x must in turn satisfy all the original 
congruences. 


Likewise, suppose we have some x that satisfies all the original congruences. Then, for any 
pi, we know that 
za; (mod p;) 


and since 
Lo =a; (mod p;) 


the transitive and of congruence imply that 
x & zo (mod p;) 
for all p;. So, by our lemma, we know that 
x£ & zo (mod pipz... Pn) 


or 
xr & zo (mod P) 


Version: 3 Owner: vampyr Author(s): vampyr 


585 


Chapter 111 


11D85 — Representation problems 


111.1 polygonal number 


A polygonal number, or figurate number, is any value of the [function] 


(d—2)n?+(4—d)n 


Pi(n) = 9 


for lintegers) n >O0andd> 3. A “generalized polygonal number” is any value of P;(n) for 
some integer d > 3 and any n € Z. For fixed d, P(n) is called a d-gonal or d-polygonal 


number. For d = 3,4,5,..., we speak of a triangular number, a |Squaré| number or a square, 


a pentagonal number, and so on. 
An [equivalent] definition of P,, by linduction| on n, is: 
P0) =0 
P(n) = Pa(n — 1) + (€d-2)(n-1)4+1 for all n > 1 
P(n — 1) = Pa(n) + (d-2)(1—n)-1 for alln <0. 


From these equations, we can deduce that all generalized polygonal numbers are nonnegative 
integers. The first two formulas show that Pi(n) points can be arranged in a set of n nested 
d-gons, as in this diagram of P3(5) = 15 and P;(5) = 35. 


Polygonal numbers were studied somewhat by the ancients, as far back as the Pythagoreans, 
but nowadays their interest is mostly historical, in connection with this famous result: 


Theorem:For any d > 3, any integer n > 0 is the sum of some d d-gonal numbers. 
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In other words, any nonnegative integer is a sum of three triangular numbers, four squares, 
five pentagonal numbers, and so on. Fermat made this remarkable statement in a letter to 
Mersenne. Regrettably, he never revealed the argument or proof that he had in mind. More 


than a century passed before Lagrange proved the easiest case: |Lagrange’s four-square theorem 
The case d = 3 was demonstrated by Gauss around 1797, and the general case by Cauchy 
in 1813. 
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Chapter 112 


11D99 — Miscellaneous 


112.1 Diophantine equation 


A Diophantine equation is an equation for which the solutions are required to be |integers| 


Generally, solving a Diophantine equation is not as straightforward as solving a similar 
equation in the [real numbers! For example, consider this equation: 


It is easy to find real numbers x,y,z that [satisfy] this equation: pick any arbitrary x and 
y, and you can compute a z from them. But if we require that x,y,z all be integers, it is 
no longer obvious at all how to find solutions. Even though raising an integer to an integer 
[power] yields another integer, the reverse is not true in general. 


As it turns out, of course, there are no solutions to the above Diophantine equation: it is a 
case of 


At the Second International Congress of Mathematicians in 1900, David Hilbert presented 
several unsolved problems in mathematics that he believed held special importance. Hilbert’s 
tenth problem was to find a general procedure for determining if Diophantine equations have 
solutions: 


“Given a Diophantine equation with any number of unknowns and with rational integer 
coefficients: devise a process, which could determine by a finite number of operations whether 
the equation is solvable in rational integers.” 


Note that this preceded the formal study of computing and 


and it is unlikely that Hilbert had anticipated a negative solution — that is, a proof that 
no such algorithm is possible — but that turned out the be the case. In the 1950s and 60s, 
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Martin Davis, Julia Robinson, and Hilary Putnam showed that an algorithm to determine 
the solubility of all exponential) Diophantine equations is impossible. 
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Chapter 113 


11E39 — Bilinear and Hermitian forms 


113.1 Hermitian form 


A sesquilinear form over a complex vector space] V is alfunctionl B : V x V — C with the 


for all x,y,z € V and c,d € C. 


A Hermitian form is a sesquilinear form B which is also complex conjugate 
B(x,y) = B(y, x). 


An inner product over a complex vector space is a positive definite Hermitian form. 
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113.2 non-degenerate bilinear form 


A bilinear form! B over a [vector space] V is said to be non-degenerate when 


e if B(x, y) =0 for all x € V, then y = 0, and 


e if B(x,y) =0 for ally € V, then x = 0. 
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113.3 positive definite form 


A {bilinear form] B on alreall or ‘complex |vector space] V is positive definite if B(x,x) > 0 for 
all nonzero lvectors}z € V. On the other hand, if B(x, x) < 0 for all nonzero vectors x € V, 
then we say B is negative definite. 


A form which is neither positive definite nor negative definite is called indefinite. 


Version: 1 Owner: djao Author(s): djao 


113.4 symmetric bilinear form 


A symmetric bilinear form is a [bilinear form) B which is symmetric] in the two coordinates} 
that is, B(x, y) = B(y, x) for alllvectors|x and y. 


Every inner product) over a vector space|is a positive definite symmetric bilinear form. 
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113.5 Clifford algebra 


Let V be alvector space] over afield) k, and Q : V x V > k alsymmetric bilinear form, Then 
the Clifford algebra Cliff(Q, V) is the quotient of the tensor algebra] T(V) by the [relations] 


v@wtwev=-—2Q(v, w) Vu,w E V. 


Since the above relationship is not [homogeneousşļin the usual Z-grading on J(V), Cliff(Q, V) 
does not inherit a Z-grading. However, by reducing mod 2, we also have a Z-grading on 


J(V), and the relations above are homogeneous with respect to this, so Cliff(Q,V) has a 
natural] Zy-grading, which makes it into a superalgebra, 


In addition, we do have a filtration) on Cliff(Q,V) (making it a {filtered algebra), and the 
associated graded algebra) pCliff(Q, V) is simply AV, thelexterior algebrajof V. In particular, 


dim Clf(Q,V)=dimA V = 2%™ V, 


The most commonly used Clifford algebra is the case V = R”, and Q is the standard 


inner product, with orthonormal basis e;,...,¢€,. In this case, the jalgebra| is generated by 
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€1,---,€n and the of the algebra 1, with the relations 


e=--1 


ee; = —eje; (1 FJ) 


Trivially, Cliff (l 


R°) = 


R, and it can be seen from the relations above that Cliff ( 


and Cliff(R®) = H, thefgquaternions 


On the other hand, for V = C” we get the particularly simple answer of 


Cliff(C7*) S Mye(C) — Cliff (C74) = Ma (C) @ Myx (C). 
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R) S C, the 


Chapter 114 


11Exx — Forms and linear algebraic 
groups 


114.1 quadratic function associated with a linear func- 
tional 


quadratic function associated with a linear functional 


Let V be a'reall/Hilbert space] (and thus an inner product space), and let f be a [continuous] 
on V. Then f has an associated quadratic function| y : V — R given by 


ote) = Zll? — Fe) 


Version: 3 Owner: drini Author(s): matte, drini, apmxi 


593 


Chapter 115 


11F06 — Structure of modular groups 
and generalizations; arithmetic groups 


115.1 Taniyama-Shimura theorem 


For any natural number| NV > 1, define the modular |group\l'9(.N) to be the following 
of the group SL,2, Z) of [integer] coefficient [matrices] of determinant} 1: 


To(N) := l (: A € SL(2,Z) 


Let H* be the subset! of the Riemann sphere) consisting of all points in the upper half plane 
(i.e., complex numbers with strictly positive imaginary part), together with thelrational numbers) 
and the point at Then To(N) acts on H*, with given by the operation 


a b _ azt+b 
ca a a 


Define Xo(N) to be the quotient of H* by the action of T'o(N). The [quotient space] Xo(N) 
inherits a and from C making it into a 
Riemann surface. (Note: H* itself is not a Riemann surface; only the quotient Xo(N) is.) 
By a general theorem in|complex’ algebraic geometry, every compact Riemann surface admits 


a unique|realization as a complex in particular, Xo(V) has such 
a realization, which by abuse of notation we will also denote Xo(V). This (curve is defined 


over Q, although the proof of this fact is beyond the {scope of this entry BI 


c0 (mod N}. 


1 Explicitly, the curve Xo(N) is the unique nonsingular projective curve which has equal 
to C(j(z),j(Nz)), where j denotes the elliptic modular j-function. The curve Xo(N) is essentially the 
algebraic curve defined by the polynomial] equation ®yv(X,Y) = 0 where ®y is the modular polynomial, 
with the caveat that this procedure yields singularities which must be resolved manually. The fact that ®y 
has integer coefficients provides one proof that Xo(N) is defined over Q. 
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Taniyama-Shimura Theorem (weak form): For any [elliptic curve) Æ defined over Q, 
there exists a positive integer N and a algebraic morphism ¢ : Xo(N) > E 
defined over Q. 


This theorem was first conjectured (in a much more precise, but equivalent) formulation) by 
Taniyama, Shimura, and Weil in the 1970’s. It attracted considerable interest in the 1980’s 
when Frey [2] proposed that the Taniyama-Shimura conjecture implies|Fermat’s last_theorem| 
In 1995, Andrew Wiles [i] proved a special case of the Taniyama-Shimura theorem which 
was strong enough to yield a proof of Fermat’s Last Theorem. The full Taniyama-Shimura 
theorem was finally proved in 1997 by a team of a half-dozen mathematicians who, build- 
ing on Wiles’s work, incrementally chipped away at the remaining cases until the full re- 
sult was proved. As of this writing, the proof of the full theorem can still be found on 


Richard ‘Taylors’s preprints page 
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REFERENCES 


1. Breuil, Christophe; Conrad, Brian; Diamond, Fred; Taylor, Richard; On the modularity of 
elliptic curves over Q: wild 3-adic exercises. J. Amer. Math. Soc. 14 (2001), no. 4, 843-939 

2. Frey, G. Links between stable elliptic curves and certain Diophantine equations. Ann. Univ. 
Sarav. 1 (1986), 1-40. 

3. Wiles, A. Modular elliptic curves and Fermat’s Last Theorem. Annals of Math. 141 (1995), 
443-551. 


Version: 10 Owner: djao Author(s): djao 


596 


Chapter 116 


11F30 — Fourier coefficients of 
automorphic forms 


116.1 Fourier coefficients 


Let f be afRiemann integrable] function! from [—7,7] to R. Then the numbers 


iy, = neh tondi 
T 


1 
bn = —int” „f (x)sin(nx)dz 
T 
are called the Fourier coefficients of the function f. 


The trigonometric series 


ag + a cos(na) + bn sin(nx)) 


n=1 


is called the trigonometric series of the function f, or Fourier series of the function f. 
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Chapter 117 


11F67 — Special values of automorphic 
L-series, periods of modular forms, 
cohomology, modular symbols 


117.1 Schanuel’s conjecutre 


Let £1, %2,--+ , £n be complex numbers linearly independent) over Q. Then the set 
x T2 z£ 
Tirtas ap Mig CO E ee} 


has transcendence degree greater than or equal to n. Though seemingly innocuous, a proof 
of Schanuel’s conjecture would prove hundreds of open conjectures in 
theory. 
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117.2 period 


A [real number z is a period if it is expressible as the integral] of an algebraic function (with 
algebraic] coefficients) over an algebraic domain, and this integral is absolutely convergent 


This representation is called the number’s period representation. An algebraic do- 
main is a subset, of R” given by polynomial with algebraic coefficients. A 
[complex number] is defined to be a period if both its real and [imaginary] parts are. The set 
of all complex) periods is denoted by P. 
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117.2.1 Examples 


Example 1. The transcendental number 7 is a period since we can write 
T = inty2yy< dx dy. 


Example 2. Any|algebraic number] a is a period since we use the somewhat natural definition 


that integration over a 0-dimensional space is taken to mean evaluation: 
Q = ints} x 
Example 3. logarithms) of algebraic numbers are periods: 


1 
loga = int{— dx 
z 


117.2.2 Non-periods 


It is by no means trivial to find complex non-periods, though there existence is clear by a 
counting argument: The set of complex numbers is uncountable) whereas the set of periods 
is countable) as there are only countably many algebraic domains to choose and countably 
many algebraic [functions over which to integrate. 


117.2.3 Inclusion 


With the existence of a non-period, we have the following [chain] of set 


ZEQCQCPEL, 


where Q denotes the set of algebraic numbers. The periods promise to prove an interesting 
and important set of numbers in that nebulous region between Q and C. 


117.2.4 References 


Kontsevich and Zagier. Periods. 2001. Available on line at http ://www.ihes. fr/PREPRINTS/M01/M01-2% 
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Chapter 118 


11G05 — Elliptic curves over global 
fields 


118.1 complex multiplication 


Let E be an elliptic curve, The endomorphism ring of E, denoted End(£), is the set of 


all regular maps} ¢: E — E such that (O) = O, where O € E is the [identity element) for 
the group) structure of Æ. Note that this is indeed a ring) under addition ((¢ + w)(P) = 


o(P) +(P)) and composition of maps 
The following theorem implies that every [endomorphism] is also a group endomorphism: 


Theorem 8. Let E, Es be elliptic curves, and let 6: Ey — E be a regular map such that 


d(On,) = On,. Then ¢ is also a i.e. 


[Proof: See [2], Theorem 4.8, page 75] 


If End(£) is|isomorphic] (as a ring) to an lorder| R in a quadratic imaginary field| K then we 
say that the elliptic curve E has complex multiplication by K (or complex multiplication by 
R). 


Note: End(E) always [contains] a |subring] isomorphic to Z, formed by the multiplication by n 
maps: 


n]: Eo E, [n]JP=n-P 


and, in general, these are all the maps in the endomorphism ring of E. 
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Example: [fixld € Z. Let E be the elliptic curve defined by 
y =x’ — dr 


then this curve has complex multiplication by Q(z) (more concretely by Z(i)). Besides the 
multiplication by n maps, End(E) contains a genuine new element: 


[i]: E > E, [i(x, y) = (—x, iy) 


(the mame] complex multiplication comes from the fact that we are ” multiplying” the points 


in the curve by a complex number, 7 in this case). 


REFERENCES 
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4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton 
University Press, Princeton, New Jersey, 1971. 
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Chapter 119 


11H06 — Lattices and convex bodies 


119.1 Minkowski’s theorem 


Let £ € R? bea in the sense of number theory, i.e. a 2-dimensional over 
Z which generates R? over R. Let w1, we be|generators| of the lattice L. A set F of the form 


F = {(2,y) E R°: (£, y) = aw, +w O<a<1, 0<8<1} 


is usually called a fundamental domain|or fundamental parallelogram for the lattice £. 


Theorem 9 (Minkowski’s Theorem). Let £ be an arbitrary lattice in R? and let A be 
the area of a fundamental parallelogram. Any convex region R symmetrical about the origin 
and of area greater than 4A [contains points of the lattice L other than the origin. 
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119.2 lattice in R” 


Definition 4. A lattice in R” is an n-dimensional additive|free group over Z which generates 


R” over R. 


Example: The following is an example of a lattice £ C 


R?, generated by w, = (1,2), w2 = 


(4,1). 
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L = {aw + bws | a, 8 € Z} 
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Chapter 120 


11H46 — Products of linear forms 


120.1 triple scalar product 


The triple scalar product of three {vectors is an extension of the It is defined 
as 


at bi Cy 


det | ag bo co | =G- (b x €) = a, det bo ca | _ a det a, + az det n 
b3 C3 b3 C3 by C2 
az b3 C3 


The determinant] above is positive if the three vectors satisfy] the right-hand rule and negative 
otherwise. Recall that the magnitude of the cross product of two vectors is equivalent to the 
area of the |parallelogram| they form, and the |dot product] is equivalent to the product of the 
projection] of one vector onto another with the length of the vector projected upon. Putting 
these two ideas together, we can see that 


|@- (b x Z| = |b x AIG] cos@ = base - height = Volume of parallelepiped 


Thus, the magnitude of the triple scalar product is equivalent to the volume of the paral- 
lelepiped formed by the three vectors. (A parallelepiped is a three-dimensional object where 
opposing faces are parallel. An example is a brick or sheared brick.) It follows that the triple 
scalar product of three coplanar or (collinear vectors is then 0. 

related to the triple scalar product: 

(Ax B)-C =(BxC)-A=(CxA)-B 

A. (Bx © =—A-(C x B) 

The latter is implied by the [properties] of the cross product. 
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Chapter 121 


11J04 — Homogeneous approximation 
to one number 


121.1 Dairichlet’s approximation theorem 


Theorem (Dirichlet, c. 1840): For any [real number] ð and any {integer|n > 1, there exist 


integers a and b such that 1 < a < n and |að — b| < a 


Proof: We can suppose n > 2. For each integer a in the|intervall [1,n], write ra = a0 — [a0] € 
[0, 1). Since the n + 2 numbers 0,7r,,1 all lie in the same unit interval, some two of them 
differ (in absolute value) by at most =- If 0 or 1 is in any such pair, then the other element 
of the pair is one of the ra, and we are done. If not, then 0 < rk— rı < = for some distinct 
k and l. If k >l we have rą— rı = rp- since each side is in [0, 1) and theldifference] between 
them is an integer. Similarly, if k < l, we have 1 — (rk — rı) = ri_x. So, with a = k — l or 
a = l — k respectively, we get 


Ira — c| < 
n+1 


where c is 0 or 1, and the result follows. 
It is clear that we can add the condition gcd(a, b) = 1 to the conclusion. 


The same statement, but with the [weaker] conclusion lað — b| < E, admits a slightly shorter 
proof, and is sometimes also referred to as the Dirichlet approximation theorem. (It was that 
shorter proof which made the “pigeonhole principle” famous.) Also, the theorem is sometimes 
restricted! to values of 6, with the (nominally stronger) conclusion |a# — b| < — 


n+1° 


Version: 2 Owner: Koro Author(s): Larry Hammick 


605 


Chapter 122 


11J68 — Approximation to algebraic 
numbers 


122.1 Davenport-Schmidt theorem 


For any [reall which is not rational] or quadratic lirrational| there are infinitely many rational 
or real quadratic irrational a which 
|Eé-a|<C- Ha)’, 


C= Co, if|éļ<1, 
E Cyt, if € |> L 


Cp is any [fixed number greater than m and H (a) is the height of a. 


where 


REFERENCES 
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122.2 Liouville approximation theorem 


Given a, a'real/algebraic number’ of degree|n ¥ 1, there is a\constant|c = c(a) > 0 such that 
for all rational numbers p/q, (p,q) = 1, the 


_ P|, ela) 


holds. 


Many mathematicians have worked at strengthening this theorem: 


e Thue: If œ is an algebraic number of degree n > 3, then there is a constant co = 
Cola, €) > 0 such that for all rational numbers p/q, the inequality 


—1-e—n/2 


a- Ë| > a 
q 


holds. 


e Siegel: If œ is an algebraic number of degree n > 2, then there is a constant cı = 
c& (a,€) > 0 such that for all rational numbers p/q, the inequality 


P -A ; n 
—-|> A= =1...n | —— +t 
q a ciq , MIN=1,..., (+t) +e 


holds. 


e Dyson: If @ is an algebraic number of degree n > 3, then there is a constant co = 
C9(a,€) > 0 such that for all rational numbers p/q with q > ce, the inequality 


—V2n-e 


holds. 


e Roth: If a is an algebraic number and e€ > 0, then there is a constant 
c3 = c3(a,€) > 0 such that for all rational numbers p/q, the inequality 


Qa— 2| > caq?" 
q 
holds. 
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122.3 proof of Liouville approximation theorem 
Let a the equation f(a) = ana” + a,j"! +- - -+ ao = 0 where the a; are [integers] 
Choose M such that M > maxa-ı<r<a+ı |f (£). 


Suppose r lies in (a — 1,a + 1) and f (2) # 0. 
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5 (2) | _ la"p"+ ET A aa = E 
q q” q” 
since the numerator] is a non-zero integer. 
By the |mean-value theorem] 
1 
rE) -f]=|(-4) rou-e 
q” q q 
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Chapter 123 


11J72 — Irrationality; linear 
independence over a field 


123.1 nth root of 2 is irrational for n > 3 (proof using 
Fermat’s last theorem) 


Proof. Suppose n > 3, and suppose 7/2 = a/b for some positive integers) a, b. It follows 
that 2 = a” /b”, or 
+o = a”. (123.1.1) 


We can now apply a recent result of Andrew Wiles [I], which|states|that there are no non-zero 
integers a, b satisfying equation (2). Thus V/2 islirrational| O 


The above proof is given in B], where it is attributed to W.H. Schultz. 
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123.2 e is irrational (proof) 


1 ae 


Now let us assume that e is{rational| This would mean there are two{natural numbers) a and 
b, such that: 
a 
e= A 
This yields: 
ble € N. 


Now we can write e using (1): 
<1 
le= b! = 
ble = b! y a 
k=0 


This can also be written: 


a ee | 
k=0 k=b+1 


The first sum is obviously] a natural number, and thus 
= bl 


oo on 1 ee 5 a ee 
k! b+1 (b+1)(6+2) 7" b+1) b 


We have also seen that this is an [integer] but there is no integer between 0 and 1. So there 


cannot exist two natural numbers a and b such that e = ¢, so e is irrational 
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123.3 irrational 


An irrational number is a which cannot be represented as a ratio of two 
That is, if x is irrational, then 
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ole 


with a,b € Z. 
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123.4 square root of 2 is irrational 


Assume that the |square root] of 2 (v2) is|rational] then we can write 
TEn 
b 
where a,b € N and a and b are relatively prime, But then 


2 = a= 2 
2? = a. 
From the above we have 2ļa?, and since 2 is it must a. Now we can write 
c = a/2 and 
2? = 4e 
b = 22. 


From the above we have 2|b?, and since 2 is prime and it must divide b. 


But if 2|a and 2|b, then a and b are not relatively prime, which contradicts the hypothesis. 
Hence the initial assumtion is false and V2 is irrational] 


With a little bit of work this argument can be generalized to any positive integer| that is not 
a Square} Let n be such an integer, then there must exist a prime p such that n = pm, 
where p f/m and k is odd. Assume that yn = a/b, where a,b € N and are relatively prime. 
This is to 
nb? = p” kb? = a’. 

From the [fundamental theorem of arithmetic] it is clear that the maximum [powers] of p that 
divides a? and b? are even. So, since m is odd, the maximum power of p that divides p™ kb? 
is also odd, and, from the above equation, the same should be true for a?. Hence, we have 
reached a contradiction and \/n must be irrational. 


The same argument can be generalized to even more, for example to the case of nonsquare 


irreduciblellfractions|and to higher lorderl[roots 
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Chapter 124 


11J81 — Transcendence (general 
theory) 


124.1 Fundamental Theorem of Transcendence 


The tongue-in-cheek name given to the fact that if n is a nonzero/|integer| then |n| > 1. This 
trick is used in many proofs. In fact, the hardest step of many 


problems is showing that a particular integer is not zero. 
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124.2 Gelfond’s theorem 


Let a and 8 be [algebraic] over Q, with @ lirrational and a not equal to 0 or 1. Then a? is 
over Q. 


This is perhaps the most useful result in determining whether a number is algebraic or 
transcendental. 
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124.3 four exponentials conjecture 


Four exponentials: conjecture: Given four complex numbers x1, £2, Y1, Y2, either x1 /x2 
or Yi/Y2 is \rational, or one of the four numbers exp(x;y;) is transcendental 
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This conjecture is stronger than the 
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124.4 six exponentials theorem 


vomplex numbers) #1, £2,..., Zn are Q-linearly independent if the only rational numbers 


T1, f2,..., fn With 
TX, +T2£2 +: +TnIn = 0 


are ry = f3 =- = Tn =Q. 


Six Theorem: If £1, £2, £3 are Q-linearly independent, and yı, Y2 are also 
Q-linearly independent, then at least one of the six numbers exp(xiy;) is transcendental. 


This is weaker than the four exponentials conjecture 


Four Exponentials Conjecture: Given four complex numbers 21, X2, Y1, Y2, either xı /x2 
or yi/Yy2 is\rational, or one of the four numbers exp(xiy;) is transcendental. 


For the history of the six exponentials theorem, we quote briefly from [6] p. 15]: 


The six exponentials theorem occurs for the first time in a paper by L. Alaoglu 
and P. Erdés [I], when these authors try to prove Ramanujan’s assertion that the 
quotient of two consecutive is a prime} they 
need to know that if x is a real number such that pī and p3 are both rational 
numbers, with pı and pọ distinct prime numbers} then zx is an|integer| However, 
this statement (special case of the four exponentials conjecture) is yet unproven. 
They quote C. L. Siegel and claim that x indeed is an integer if one assumes p7 
to be rational for three distinct primes p;. This is just a special case of the six 
exponentials theorem. They deduce that the quotient of two consecutive superior 
highly composite numbers is either a prime, or else a product of two primes. 


The six exponentials theorem can be deduced from a very general result of Th. 
Schneider [4]. The four exponentials conjecture is equivalent) to the first of the 
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eight problems at the end of Schneider’s book [5]. An explicit statement of the six 
exponentials theorem, together with a proof, has been published independently 
and at about the same time by S. Lang P| Chapter 2] and K. Ramachandra [8] 
Chapter 2]. They both formulated the four exponentials conjecture explicitly. 
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124.5 transcendental number 


A transcendental number is a that is not an algebraic number, The 
most famous transcendental numbers are m and e (the natural log base!) 


Cantor showed that, in a sense, “almost all’ numbers are transcendental, because the alge- 
braic numbers are countable, whereas the transcendental numbers are not. 
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Chapter 125 


11K16 — Normal numbers, radix 
expansions, etc. 


125.1 absolutely normal 


Let x € R and b € N with b > 2. We consider the Sequence] of digits of x in {basel b. If s is a 
of digits in base b and n € N, we let N(s,n) be the number of times the string 
s occurs among the first n digits of x in base b. We say that x is normal in base b if 


i N(s,n) 1 


n— Co n E bE 


for every string s of length k. 


Intuitively, x is normal in base b if all digits and digit-blocks in the base-b digit sequence of 
x occur just as often as would be expected if the sequence had been produced completely 
randomly. 


We say that x is absolutely normal if it is normal in every base b > 2. (Some authors use 
the term “normal” instead of ” absolutely normal” .) 


Absolutely normal numbers were first defined by Émile Borel in 1909. Borel also proved that 
are absolutely normal, in the sense that the numbers that are not 


absolutely normal form a set with zero. However, for any base b, there 
are uncountably many numbers that are not normal in base b. 


Champernowne’s number 
0.1234567891011121314... 


(obtained by concatentating the decimal expansions of all is normal in 
base 10, but not absolutely normal. 


Few absolutely normal numbers are known. The first one was constructed by Sierpinski in 
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1916, and a related construction led to a [computable] absolutely normal number in 2002. 
Maybe the most prominent absolutely normal number is Chaitin’s constant Q, which is not 
computable. 


No [rational number| can be normal in any base. Beyond that, it is extremely hard to prove 
or disprove normality of a given constant. For instance, it has been conjectured that all 


irrational algebraic numbers are absolutely normal since no counterexamples are known; on 


the other hand, not a single irrational algebraic number has been proven normal in any base. 
Likewise, it is conjectured that the T, e, and In(2) are normal, and 


this is supported by some empirical evidence, but a proof is out of reach. We don’t even 


know which digits occur in the decimal expansion of 7. 
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Chapter 126 


11K45 — Pseudo-random numbers; 
Monte Carlo methods 


126.1 pseudorandom numbers 


Generated in a digital computer by a numerical algorithm, pseudorandom numbers are not 
random, but should appear to be random when used in [Monte Carlo) calculations. 


The most widely used and best understood is the Lehmer multiplicative 
congruential generator, in which each number r is calculated as a/function| of the preceding 


number in the jsequence} 
ri = [ari_i| (mod#1) 


or 


T; = lari + c] (mod#1) 


where a and c are carefully chosen ‘constants, and m is usually a of two, 2*. All 
quantities appearing in the (except m) are of k bits. The expression in 
brackets is an integer of length 2k bits, and the effect of the modulo (mod#1) is to mask 
off the most significant part of the result of the multiplication. ro is the seed of a generation 
sequence; many generators allow one to start with a different seed for each run of a program, 
to avoid re-generating the same sequence, or to preserve the seed at the end of one run for 
the beginning of a subsequent one. Before being used in calculations, the r; are usually 
transformed to floating point numbers normalized into the (0, 1]. Generators of this 
itype|can be found which attain the maximum possible [period] of 2k-2 and whose sequences 


pass all reasonable tests of “randomness”, provided one does not exhaust more than a few 
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percent of the full period. 
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126.2 quasirandom numbers 


Quasirandom Numbers are pequences] of numbers to be used in calculations, 
optimized not to appear highly random, but rather to give the fastest convergence in the 
computation. They are applicable mainly to multidimensional integration, where the [theory] 
is based on that of {uniformity) of [distribution] ({[Kuipers74]). 


Because the way of generating and using them is quite different, one must distinguish between 


and quasirandom sequences. 


A finite quasirandom sequence is optimized for a particular number of points in a particular 
dimensionality of space. However, the complexity of this optimization is so horrendous that 
exact solutions are known only for very small point sets ({[Kuipers74], |Zaremba72]) The most 
widely used sequences in practice are the Korobov sequences. 


An infinite quasirandom sequence is an algorithm which allows the generation of sequences 
of an arbitrary number of of arbitrary length (p-dimensional points). The 
of these sequences are generally known only asymptotically, where they perform considerably 


better than truly random o”(pseucorandon sequences, since they give 1/N convergence for 


Monte Carlo integration instead of 1/V N. The short-term distribution may, however, be 
rather poor, and should be examined carefully before being used in sensitive 
calculations. Major improvements are possible by shuffling, or changing the in which 
the numbers are used. An leffective| shuffling technique is given in [Braaten79]. 
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126.3 random numbers 


Random numbers are particular occurrences of random variables; They are necessary for 
Monte Carlo calculations as well as many other computerized processes. There are three 


kinds of random numbers: 


e truly random numbers 


e pseudorandom numbers 


3 


uasirandom numbers 


Ke 


Version: 1 Owner: akrowne Author(s): akrowne 


126.4 truly random numbers 


Truly random numbers can only be generated by a physical process and cannot be generated 
via software. This makes it rather clumsy to use them in [Monte Carlo] calculations, since 
they must be first generated in a separate device and either sent to the computer or recorded 
(for example on removable storage media) for later use in calculations. Traditionally, tapes 
containing millions oflrandom numbers generated using radioactive decay were available from 
laboratories. 


Nowadays, standard digital computers often have provisions for obtaining truly random 
numbers, that is, numbers generated by a physical process. Fir instance, Intel has provided 
a function since their 7810 chipsets which utilizes noise in a particularly prone semiconductor 
as a/sourcé|of randomness. Often times it is possible to use other incidental physical noise as 
a source; for example, static on the the input channel of a'sound card. In addition, peripheral 
devices (add-ons) to personal computers exist which provide truly random numbers, when 
the previous methods fail. 
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Chapter 127 


11L03 — Trigonometric and 
exponential sums, general 


127.1 Ramanujan sum 


For positive s and n, the 
c,(n) = » e2riks/n 


0<k<n 
(k,n)=1 


is referred to as a Ramanujan sum, or a Ramanujan trigonometric sum. Since e?”' = 1, an 
equivalent definition is 
n) — . e2tiks/n 


ker(n) 


where r(n) is some reduced system mod n, meaning any |subset) of Z containing 
exactly one element of each invertible residue |class| mod n. 


Using a/symmetry|argument about one can show 
s if s|n 
S es(n) = ua 
T 0 otherwise. 
Applying Möbius linversion| we get 


= n/d)d me 


d|(n,s) 
ae 


which shows that c,(n) is areal number) and indeed an integer. In particular c,(1) = p(s). 
More generally, 
Calmn) = cs(m)aln) if (m, t) = (n,s)=1. 
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Using the|Chinese remainder theorem) it is not hard to show that for any fixed n, the|function| 
5+ a(n) is 


cs(n)aln) = caln) if (8,0) = 1. 


If m is invertible mod n, then the k = km isa of the invertible residue 
classes mod n. Therefore 
enn) Sela) if (m,s)=1. 


Remarks:Trigonometric sums often make convenient apparatus in number theory, since any 
function on a quotient ring| of Z defines a periodic function on Z itself, and conversely. For 
another example, see Landsberg-Schaar relation 


Some writers use different notation from ours, reversing the roles of s and n in the expression 
€,(n): 


The name “Ramanujan sum” was introduced by Hardy. 
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Chapter 128 


11L05 — Gauss and Kloosterman 
sums; generalizations 


128.1 Gauss sum 


Let p be a Let x be any multiplicative on Z/pZ (that is, any 
group homomorphism) of multiplicative groups (Z/pZ)* — C%*). For any a € Z/pZ, the 
a= > xe" 
tEZ/pZ, 


is called a Gauss sum on Z/pZ associated to x. 


In general, the equation g,(y) = x(a~')gi(x) (for nontrivial a and x) reduces the compu- 
tation of general Gauss sums to that of gı(x). The absolute value] of gı(x) is always \/p as 
long as y is nontrivial, and if x is a quadratic character (that is, x(t) is the 


£) ), then the value of the Gauss sum is known to be 


_ Jyp pei (mod 4), 
TR {ve p&3 (mod 4). 
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Version: 4 Owner: djao Author(s): djao 


622 


128.2 Kloosterman sum 


The Kloosterman sum is one of various trigonometric sums that are useful in number theory 
and, more generally, in finite harmonic analysis. The original Kloosterman sum is 


K,(a,b) = Y exp (met) 


reFs P 


where F, is the field] of p. Such sums have been generalized in a few different 
ways since their introduction in 1926. For instance, let q be a prime |power| F, the field of q 


elements, x : F* — C alcharacter, and 7: F, > Ca such that y(x +y) = y(x) (y) 


identically. The sums 


Ky(xla,b) = X x(£)y(ar + ba") 


xe 


are of interest, because they come up as Fourier coefficients of modular forms. 


Kloosterman sums are finite analogs of the K-Bessel functions! of this kind: 


1 = =i 
K,(a) = singe! exp (==) dx 


where Re(a) > 0. 
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128.3  Landsberg-Schaar relation 


The Landsberg-Schaar relation states that for any positive p and q: 


1 S (=) eur 5 ( ue) (128.3.1) 
—) exp = exp | — aD 
VP p vd “ 2q 
Although both sides of (1) are mere finite sums, no one has yet found a proof which uses no 


linfinitel limiting process. One way to prove it is to put T = 2iq/p + €, where € > 0, in this 
identity due to Jacobi: 


- » 1 & 4 
a eo ce F So em (128.3.2) 


and let e — 0. The details can be found in various works on harmonic analysis, such as [1]. 
The identity (2) is a basic one in the theory of theta functions. It is sometimes called the 


unctional equation for the Riemann theta function, See e.g. [2 VII.6.2]. 


If we just let q = 1 in the Landsberg-Schaar identity, it reduces to a formula for the quadratic 


(Gauss sum) mod p; notice that p need not be [prime] 
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128.4 derivation of Gauss sum up to a sign 


The [Gauss sum] can be easily evaluated up to a sign by squaring the original |series| 


zonzo 


sEZ/pZ tEZ/pZ 


t , 
y (= ) e2ri(s+t)/p 
P 


s,tEZ/pZ 


91(Xx) 


and summing over a new variable n = s~'t (mod p) 


y (=) e2ri(s+ns) 


8,nEZ/pZ 
; e2Tis(n+1) 


) s€Z/pZ 
) (ain 1 (mod 1) 
=o) 


nEZ/pZ 


-| p, ifp&1 (mod 4), 


M 


| 
iS 


—p, ifp3 (mod 4). 
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Chapter 129 


11L40 — Estimates on character sums 


129.1 Plya-Vinogradov inequality 
Theorem 8. Form,n€WN and p a positive odd rational prime, 


2 


t=m 


< y/p lnp. 


S tart with the following manipulations: 


men p—1 m+n p—1 eee ie p—1 m+n posi t —2riat/p 
5° (5) FESS O aa EES oe (4) 


t=m PSG en a0 a t=0 


The expression 4- ($ je ?riat/P is just a [Gauss sum) and has magnitude \/p. Hence 


ales, e2tian/p _ 1 


p = e2tia/p — 1 


Toe 
w p Bo 


e2Tiam/p 5 e 2riax/p 
x=0 
=l 


m+n LE 
SO < FEE 
VP p—1 eTian/p sin(ran/p) 


p = eria/P sin(ra/p) 


Here (x) denotes the [absolute valuelof the \difference| between x and the closest to x, 


i.e. (x) = inf,ez{|z — z|}. 


Since p is odd, we have 
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a=1 0<a<$ a=1 


Now In 34+ > + for x > 1; to prove this, it suffices to show that the |function| f : [1,00) > R 
given by f(z) = xIn 54 is decreasing and approaches 1 as x — oo. To prove the latter 
statement, substitute v = 1/x and take the limitlas v — 0 using L’Hopital’s rule. To prove 
the former statement, it will suffice to show that f’ is less than zero on the [interval [1, co). 
But f'(x) + 0 as x — oo and f' is increasing on [1, 00), since f”(x) = 4(1- a >0 
for x > 1, so f’ is less than zero for x > 1. 


With this in hand, we have 


2o 


REFERENCES 


1. Vinogradov, I. M., Elements of Number Theory, 5th rev. ed., Dover, 1954. 


pal pal 
VP “1l ^ 2a+1 
< =. -=< l = In p. 
aor a, Qe res vp np 
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Chapter 130 


11M06 — ¢(s) and L(s, x) 


130.1 Apéry’s constant 


The number 


oO 


ORE 


= 1.202056903159594285399738161511449990764986292. . . 


has been called Apéry’s constant since 1979, when Roger Apéry published a remarkable proof 


that it is irrational] [I]. 
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130.2 Dedekind zeta function 
Let K be a [number field) with ring of integers] O Kx. Then the Dedekind zeta function of K is 
the janalytic| continuation of the following [series} 

Cx(s)= S > (NO (D)* 


ICOK 
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where J ranges) over non-zero \ideals| of Ox, and NĚ (J) = |Ox : I| is the mormi of J. 


This converges for Re(s) > 1, and has a meromorphic| continuation to the whole plane, with 
a [simple pole] at s = 1, and no others. 


The Dedekind zeta function has an Euler product expansion, 


1 
Cx(s) = II I (NE) 


where p ranges over of Og. The Dedekind zeta function of Q is just the 
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130.3 Dirichlet L-series 


The Dirichlet L-series associated to a y is the beries] 


L(x, 8) = 3 xn) (130.3.1) 


It and uniformly in the Re(s) > 1 + ô for any positive ô, and 
admits the Euler product 


1 


L(x, 8) = I] Ter 


TE (130.3.2) 


where the product is over all p, by virtue of the multiplicativity of x. In the case 
where x = Yo is the [trivial character| mod m, we have 


L(xo,8) = ¢(s) L [0 -2° (130.3.3) 


plm 


where ¢(s) is the If x is non-primitive, and C% is the [conductor] of 
x, we have 
L(x, 8) = L(x, s) [] A —x)p~), (130.3.4) 


pim 


ptCy 


where x/ is the which induces y. For non-trivial, primitive characters 
x mod m, L(y,s) admits an continuation to all of C and satsfies the 


L(x, 8) g T € u ~) — 200 r1 5) eu T (==) < (130.3.5) 


2 2 
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Here, ey € {0,1} is defined by x(—1) = (—1)*x(1), T is the [gamma function} and g;(x) 
is a (3),(4), and (5) combined show that L(y,s) admits a (3)meromorphic| 
(3)continuation to all of C for all Dirichlet characters x, and an analytic one for (3)non- 
trivial y. Again assuming that x is non-trivial and primitive character mod m, if k is a 


positive [integer] we have 
B 
L(x,1—k) =-=*, (130.3.6) 


where B;.y is a generalized Bernoulli number, By (5), taking into account the poles) of T, we 


get for k positive, k = e, mod 2, 


k 
k-ex 9i(X) (20 \" Bry- 
L(x, k) = (-1)"> um (=) a (130.3.7) 


This series was first investigated by (duh) Dirichlet, who used the non-vanishing of L(y, 1) for 


non-trivial y to prove his famous |Dirichlet’s theorem on primes in arithmetic progression 
This is probably the first instance of using analysis to prove a purely number 
theoretic result. 
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130.4 Riemann 6-function 


The Riemann theta|function| is a number theoretic function which is only really used in the 
[derivation] of the functional equation for the Riemann Xi function 


It is defined as: 

O(x) = 2w(x)+ 1 

where w is the Riemann omega function] 

The domain] of the Riemann Theta function is x j, 0. 


To give an [exact forml for the theta function, note that: 


so that: 

Qw(x)+1 = J ni eM w(g)+1 = net idan > Sa enire p ere ewe i 
(= ye" 

Riemann showed that the theta function satisfied a functional equation] which was the key 


step in the proof of the continuation for the this has direct 
consequences of course with the Riemann zeta function| 
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130.5 Riemann Xi function 


The Xilfunction|is the function which is the key to the for the 


It is defined as: 


i 


2D (58)¢(s) 


Riemann himself used the notation of a lower case xi (£). The famous Riemann hypothesis is 
equivalent) to the assertion that all the zeros of £ are|real| in fact Riemann himself presented 


his original hypothesis in {terms) of that function. 


a(s) = 7 


Riemann’s lower case xi is defined as: 


E(s) = 98(s — 1)E(s) 
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130.6 Riemann omega function 


The Riemann Omega function is used in the proof of the continuation for the 
Riemann Xi function to the whole complex plane, It is defined as: 


w(x) _ a ene ne 


The Omega |function| satisfies) a also, which can be easily derived from 


the theta functional equation. 
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130.7 functional equation for the Riemann Xi function 
The satisfies a functional equation, which directly implies the Riemann 
Zeta |function’s| functional equation. The proof depends on the Riemann Theta function. 
F(s) = =(1—-s) 

You can see from the definition that the Xi function is not defined for Re(s) < 1, since the 


is only defined for Re(s) > 1, so this is an important theorem for 
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the Zeta function (in fact, there are no zeros with reall part greater than 1, so without this 
functional equation the study of the zeta function would be very limited). 
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130.8 functional equation for the Riemann theta func- 
tion 


The used in the [derivation] of the [functional equation for the Riemann Xi function 
This is not as remarkable as the one for the Xi [function] because it does 
not actually extend the domain) of the function. 

(2) = 02) 

The proof relies on the Cauchy integral formula and the [Poisson summation formula) 
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130.9 generalized Riemann hypothesis 


This generalization of the Riemann hypothesis to arbitrary Dedekind zeta functions, states 
that for any number field] K, the only zeroes s of the Dedekind zeta function C(s) that: lie 
in the strip 0 < Res < 1 satisfy Res = z. 
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130.10 proof of functional equation for the Riemann 
theta function 


All sums are over all [integers] unless otherwise specified. Thus the Riemann theta [function] 
is 
ODI 


—nay? 


Now, we wish to apply Poission summation to f(x,y) = e , in the variable y. 0(x) = 


>_,, f(#,n), and thus is equal to $}, fY(x,n) where 


fY (a, n) = inte f(x, y)e2""dy = intye™ 220) dy. 
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(OK, I thought I knew this one, but I’m stuck, and have to go. I'll finish it later). 
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Chapter 131 


11M99 — Miscellaneous 


131.1 Riemann zeta function 


131.1.1 Definition 


The Riemann zeta function is defined to be the [complex] valued [function] given by the [series] 


1 


’ 
ns 
n=! 


which is valid (in fact, absolutely convergent) for all s with Re(s) > 1. We 
list here some of the key [I] of the zeta function. 


(131.1.1) 


1. For all s with Re(s) > 1, the zeta function |satisfies| the Euler product formula 


¢(s) = = (131.1.2) 


where the product is taken over all positive integer primes p, and converges uniformly 
in a neighborhood of s. 


2. The zeta function has a [meromorphic] continuation to the[entirelcomplex plane with a 
simple pole) at s = 1, of 1, and no other singularities. 


3. The zeta function satisfies the 
ssl... TS 
C(s) = 2°7*~ sin ofl — s)¢(1—s), (131.1.3) 


for any s € C (where [ denotes the gamma function). 
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131.1.2 Distribution of primes 


The Euler product formula (I3L.1-2) given above expresses the zeta function as a product 
over the primes p € Z, and consequently provides a link) between the [analytic] properties of 
the zeta function and the [distribution] of primes in the integers. As the simplest possible 
illustration of this link, we show how the properties of the zeta function given above can be 
used to prove that there are infinitely many primes. 


If the set S of primes in Z were then the Euler product formula 


«=I 


pes 


would be a finite product, and consequently lim,_.; C(s) would exist and would equal 


lim ¢(s) = [J : 


sol 1 — pt 


But the existence of this [limit] contradicts the fact that ¢ (s) has a [pole] at s = 1, so the set 
S of primes cannot be finite. 


A more sophisticated analysis of the zeta function along these lines can be used to prove both 

the analytic|prime number theorem|and(Dirichlet's theorem on primes in arithmetic progressions] 
Proofs of the prime number theorem can be found in [2] and [5], and for proofs of Dirichlet’s 
theorem on primes in arithmetic progressions the reader may look in [2] and [I]. 


131.1.3 Zeros of the zeta function 


A nontrivial zero of the Riemann zeta function is defined to be a|root)¢(s) = 0 of the zeta 
function with the property that 0 < Re(s) < 1. Any other zero is called trivial zero of the 
zeta function. 


The reason behind the terminology is as follows. For complex numbers s with 
greater than 1, the series definition (3.7) immediately shows that no zeros of the zeta 
function exist in this|region| It is then an easy matter to use the functional equation (I3L.T.-3) 
to find all zeros of the zeta function with real part less than 0 (it turns out they are exactly 
the values —2n, for n a positive integer). However, for values of s with real part between 0 
and 1, the situation is quite different, since we have neither a series definition nor a functional 
equation to fall back upon; and indeed to this day very little is known about the behavior 
of the zeta function inside this critical strip of the complex plane. 


It is known that the prime number theorem is to the assertion that the zeta 
function has no zeros s with Re(s) = 0 or Re(s) = 1. The celebrated Riemann hypothesis 


1 Tn the case ofarithmetic progressions, one also needs to examine the closely related Dirichlet L-functions 


in addition to the zeta function itself. 
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asserts that all nontrivial zeros s of the zeta function satisfy the much more precise equation 
Re(s) = 1/2. If true, the hypothesis would have profound consequences on the distribution 
of primes in the integers [5]. 
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131.2 formulae for zeta in the critical strip 


Let us use the traditional notation s = ø + it for the [complex] variable, where o and t are 


1 
(S454. 8 ee (131.2.1) 
n=1 

¢(s) = —41=sint? a dr o>0 (131.2.2) 

Ss — S 
ii 

(s) = — +--— ae CDi o> -—l1 131.2.3 

s — 1 2 gst1 


where [zx] denotes the largest < x, and ((x)) denotes x — |z] — 


Nn|—= 


We will prove (312.2) and (312.3) with the help of this useful 


Lemma: For integers u and v such that 0 < u < v: 


v l—s l—s 
x — |z] vf —4u 
—Ss $ Vv 
n = —sin 
: sint, —; 47 + r 
n=u+1 


Proof: If we can prove the special case v = u + 1, namely 


x — |z] PIN (u+1)-%—uls 


1 ee qutl 
(u+ 1) sint aa z 


(131.2.4) 
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then the lemma will follow by summing afinite|sequence] of cases of (31.2.4). The 


ints ————_ = intglu +t) *dt — intgu(u +t) dt 


(u+1)-%-ul-s yu (u +1) — u~] 
L=¢6 s 


and the lemma is proved. 


Now take u = 1 and let v > oo in the lemma, showing that (31.2.2) holds for o > 1. By 
the principle of continuation, if the integral in (131.2.2) is analytic for ø > 0, then 


(31.2.2) holds for o > 0. But x — [a] is so the integral converges uniformly on 
o > e for any € > 0, and the claim (312.2) follows. 


We have 
1 n40 np — ls l 
zint x d= 5 
Adding and subtracting this quantity from (312.2), we get (312.3) for o > 0. We need 


to show that 
(e)) 


etl 


int; 
is analytic on g > —1. Write 
f(y) = inti ((x) dx 
and integrate by parts: 


((z)) 


stl 


fle), 


geste 


int? dx = lim f(x)": — f(1)a* 1 + (s + Lint? 


The first two terms on the right are zero, and the integral converges for o > —1 because f 
is bounded. 


Remarks: We will prove ([3L-2-J) in a later version of this entry. 


Using formula one can verify Riemann’s functional equation in the strip —1 < o < 
g , y q P 


2. By analytic continuation, it follows that the functional equation holds everywhere. One 
way to prove it in the strip is to decompose the sawtooth function) ((x)) into a[Fourier series 
and do a termwise integration. But the proof gets rather technical, because that |series| does 
not converge uniformly. 
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131.3 functional equation of the Riemann zeta func- 
tion 


Let I denote the|gamma function| ¢ the and s any [complex number] 
Then 


Though the equation appears too intricate to be of any use, the inherent symmetry in the 
[formula] makes this the simplest method of evaluating ¢ (s) at points to the left of the critical 
strip. 
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131.4 value of the Riemann zeta function at s = 2 
Here we present an application of Parseval’s equality to number [theory] Let ¢(s) denote the 
We will compute the value 
¢(2) 
with the help of Fourier analysis. 


Example: 


Let f: R — R be the “identity” defined by 


f(x) =a, forall z € R 


The |Fourier serieslof this function has been computed in the entry [examples of Fourier sery 
Thus 


{f= g = ob + el !cos(na) +b! sin(nz)) 
= 12 te Va E (—r, T) 


Parseval’s theorem asserts that: 
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> 
ll 
= 


CISD CRICED DE ESD 
k=1 n=1 n=l 
and s 
~int” f (x)dx = int? £’ dt = — 
Hence by Parseval’s equality 
2 1 w? 
-3 — 
~n 3 
and hence 
=E 
2) = e A 
¢(2) 2-6 
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Chapter 132 


11N05 — Distribution of primes 


132.1 Bertrand’s conjecture 


Bertrand conjectured that for every positive linteger| n > 1, there exists at least one prime 
p satisfying n < p < 2n. This result was proven in 1850 by Chebyshev, but the name 
”Bertrand’s Conjecture” remains in the literature. 
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132.2 Brun’s constant 


Brun’s constant is the sum of the reciprocals of all twin [primes] 


1 1 
B= Z + —— | & 1,9216058. 
D (G E p+ z) 


p+2 is prime 


Viggo Brun proved that the exists by using a new sieving method, which later 
became known as Brun’s sieve. 
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132.3 proof of Bertrand’s conjecture 


This is a version of Erdös proof as it appears in Hardy and Wright. 


We start by deriving an [upper bound) on O. 
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Definition. 


O(n) = X` logp 


pgn 
p prime 


Theorem. O(n) < nlog4 

Proof. By [induction] 

The cases for n = 1 and n = 2 follow by inspection. 

For even n > 2, the case follows immediately from the case for n — 1 since n isn’t 


So let n = 2m + 1 with m > 0 and consider (1 + 1)?”"*' and its binomial expansion. Since 
Ta = a] and both iterms] occur exactly once, we find Co < 4™. Each prime p 


m m+1 
withm+l<p< 2m + 1|divides] (+?) and so 0(2m+1)—O(m+1) < log gp < mlog 4. 
By induction O(m + 1) < (m+ 1) log4 and so O(2m+4 1) < (2m + 1) log 4. 


Now we can deal with the main theorem. Suppose n > 2 and there is no prime p with 
n= p< 2n. 


Consider (1 + 1)?”. Since Er is the largest term in the binomial expansion that has 2n + 1 
me 4” 
n) 7 2mn+1' 


terms, ( 


For a prime p define r(p,n) to be the highest power) of p dividing i We first look at the 
highest power of p dividing n!. Every p-th term contributes a factor, so we have already | 
factors, where [x] is the integer] part of x. However, every p?-th term contributes an extra 
factor above that, and every p®-th term one more and so on. So the highest power of p 
dividing n! is }7 [25]. Now for r(p,n) we need to take the factors contributed by (2n)! and 


subtract twice the factors taken away by n!. This leads us to r(p,n) = >>, ([2] —2[4]). Now 


iM pi pi 
each of these terms is either 0 or 1 (as is every value of [2x] — 2[x]), and the terms vanish for 
j> [ee], so r(p,n) < [2] orp) < Qn. 


Now e) =|], p’®). By the previous inequality, primes larger than 2n do not contribute 
to this product and by assumption there are no primes between n and 2n. So 


GR 


l<p<n 
p prime 
If p > V2n, all the terms for higher powers of p vanish and r(p,n) = [=] — 2[5]: 


For n > p> n, 3 > F > l and so for p > 2 > V2n we can apply the previous formula) for 
r(p,n) and find that it’s zero. So for all n > 4, the contribution of the primes larger than 
2n 


3 1S Zero. 
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Now for p > V2n, r(p,n) is at most 1, so an upper bound for the contribution for the primes 


between yv 2n and a is the product of all primes smaller than 2u which equals e90), There 


are at most v2n primes smaller than v2n and by the inequality p’” < 2n their product 
is less than (2n)¥?”. 


Combining these we get 


4 < (r) < (m) e, 
n 


Using 2n + 1 < (2n)? and taking logarithms, we find net < (V2n + 2) log(2n) which is 
false for large enough n, say n = 211, leading to a contradiction. For smaller n, 
we prove the theorem by exhibiting a {Sequence of primes in which each is smaller than the 
double of its predecessor, e.g., 3,5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503. 
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132.4 twin prime conjecture 


Two consecutive odd numbers) which are both |prime|are called twin primes, e.g. 5 and 7, or 
41 and 43, or 1,000,000,000,061 and 1,000,000,000,063. But is there an infinite! number of 


twin primes ? 
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Chapter 133 


11N13 — Primes in progressions 
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Chapter 134 


11N32 — Primes represented by 
polynomials; other multiplicative 
structure of polynomial values 


134.1 Euler four-square identity 


The Euler four-square identity simply |states) that 


(£+ eitri tri (yi +y +y +y) = (e1yi+rey2+rsy3t cays)? + (T1Y2— L21 +L3Y4— T4Y3) 
H(T1Y3 — L3Y1 + Layo — T2Y4)? + (T1Y4 — Lays + T2Y3 — T3Y2) 


It may be derived from the of[quaternions|that the mormlof the product is equal to 


the product of the norms. 
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Chapter 135 


11N56 — Rate of growth of arithmetic 
functions 


135.1 highly composite number 


We call n a highly composite number if d(n) > d(m) for all m < n, where d(n) is the number 
of divisors] of n. The first several are 1, 2, 4, 6, 12, 24. The [sequence] is |A002182)in Sloane’s 


encyclopedia. 


The [integer] n is superior highly [composite] if there is an € > 0 such that for all m Æ n, 
d(n)n-* > d(m)m~. 


The first several superior highly composite numbers are 2, 6, 12, 60, 120, 360. The sequence 
is [A002201 in Sloane’s encyclopedia. 
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Chapter 136 


11N99 — Miscellaneous 


136.1 Chinese remainder theorem 


Let R be a{commutative ring] with {identity| If I,,..., I, arelideals|of R such that [,+1,=R 


whenever 7 Æ j, then let 


The sum of R/I — R/T, gives an [isomorphism] 


RJ S J] 4/6. 


i=1 


This has the slightly weaker|consequence that given a system of{congruences) = a; (mod J;), 
there is a solution in R which is unique mod J, as the theorem is usually stated for the 
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136.2 proof of chinese remainder theorem 


First we prove that a; + [| ivi = R for each 7. Without loss of generality, assume that 
i = 1. Then 
R = (a + a2) (a1 + a3)--- (a1 + an), 


since each factorla, + a; is R. Expanding the product, each term] will [contain] a; as a factor, 
except the term a22- --aAp. So we have 


(a, + d2)(a, + a3)--- (a; + an) C ay + dodg--- dy, 
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and hence the expression on the right hand side must equal R. 


Now we can prove that [] a; = (aj, by induction| The statement is trivial for n = 1. For 
n = 2, note that 


ay N a2 = (ay N a2)R = (ay N a2) (d1 + dg) C a201 + M102 = ayaa, 
and the reverse [inclusion] is obvious, since each a; is an lideall Assume that the statement is 
proved for n — 1, and condsider it for n. Then 


n 


n n 
u= aNu aN 
1 2 2 


using the induction hypothesis in the last step. But using the fact proved above and the 
n = 2 case, we see that 


nm n n 
a \[s=a-T=Ta 
2 2 1 


Finally, we are ready to prove the Chinese remainder theorem. Consider the\ring homomorphism) 


R— ] | R/a; defined by on each [component] of the product: x +> (a, + £, a2 + 
T,..., An +7). It is easy to see that the [kernel of this map) is ()a;, which is also [[ a; by the 


earlier part of the proof. So it only remains to show that the map is|surjective| 


Accordingly, take an arbitrary element (a; + £1, d2+%2,...,4n+2%n) of | [ R/a;. Using the 
first part of the proof, for each i, we can find elements y; € a; and z; € [[ je lj such that 
Yi + zi = 1. Put 

T= X12, FTaža +++ + UyZp. 


Then for each 2, 
a+ T =A + Tiži, 


since 32; € a; for all j + i, 
= 4; + TiYi + Vi, 


since x;y; E Ai, 
= q; + tilyi + zi) = di + xi- 1 = d; + zi. 


Thus the map is surjective as required, and induces the isomporphism 


R R 
— 
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Chapter 137 


11P05 — Waring’s problem and 
variants 


137.1 Lagrange’s four-square theorem 


Lagrange’s four-square theorem [states] that every non-negative integer| may be expressed as 
the sum of at most four/squares| By the Euler four-square identity, it is enough to show that 


every is expressible by at most four squares. It was later proved that any number not 
of the form 4"(8m + 7) may be expressed as the sum of at most three squares. 


This shows that g(2) = G(2) = 4, where g and G are the Waring [functions] 
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137.2 Waring’s problem 


Waring asked whether it is possible to [represent] every (natural number] as a sum of 
number of nonnegative k’th powers, that is, whether the set {n*|n € Z, } is albasisl He 


was led to this conjecture by Lagrange’s theorem which asserted that every natural number 
can be represented as a sum of four squares} 


Hilbert [I] was the first to prove the Waring’s problem for all k. In his paper he did not give 
an explicit /bound| on g(k), the number of powers needed, but later it was proved that 


g(k) = 2° + (3) | =2 


except possibly finitely many exceptional k, none of which are known to the date. 
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Wooley [B], improving the result of Vinogradov, proved that the number of k’th powers needed 
to represent: all sufficiently large integers) is 


G(k) < k(Ink + InInk + O(1)). 
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137.3 proof of Lagrange’s four-square theorem 


The following proof is essentially Lagrange’s original, from around 1770. 


lemma) 1: For any [integers] a,b, c,d, w, £, Y, Z, 
(@ +P ++ Pw Hr +y?+27) = (awt+bet+cy+dz)’ 


ax — bw — cz + dy)? 


( ) 
( ) 
(ay + bz — cw — dx) 
( ) 


az — by + cz — dw}. 


This is the Kuler four-square identity, q.v., with different notation. 
Lemma 2: If an [even number) 2m is a sum of two [squares then so is m. 
Proof: Say 2m = x? + y?. Then x and y are both even or both odd. Therefore, in the 
wa) ea) 
“kg 2 7? 


both |fractions on the right side are integers. 


Lemma 3: If p is an odd prime, then a? +b? + 1 = kp for some integers a,b, k with 0 < k. 


Proof: Consider the values of a?, and the values of —b? — 1, for 


p-l1 


p—1 
b=0,1,..., —— 
A i ? 2 


No two elements of the first set are mod p, and no two of the second. Since 
each set has pit elements, but there are only p residue)classes, something in the first set is 


congruent to something in the second, i.e. 
a? +b? +1=kp 
for some k. Clearly 0 < k. 


By Lemma 1 we need only show that an arbitrary prime p is a sum of four squares. Since 
that is trivial for p = 2, suppose p is odd. By Lemma 3, we know 


mp=0@ +b +e +d 


for some m, a,b, c,d with 0 < m. To complete the proof, we will show that if m > 1 then np 
is a sum of four squares for some n with 1 <n < m. 


If m is even, then none, two, or all four of a,b, c,d are even; in any of those cases, Lemma 2 
allows us to take n = m/2. So assume m is odd but > 1. Write 


w & a (mod m) 
x & b (mod m) 
y & c (mod m) 
z & d (mod m) 


where w, x,y,z are all in the [intervall (—m/2, m/2). We have 


m2 
weai +2 aA =m? 
w +r? +y +z? S0 (mod m). 


So w? + 2? + y? + 27 = nm for some integer n with 0 < n < m. But now look at Lemma 1. 


On the left is nm?p. [Evidently] these three sums: 
ax — bw — cz + dy 
ay + bz — cw — dx 
az — by + cx — dw 


are multiples of m. The same is true of the other sum on the right in Lemma 1: 
aw +bx + cy + dz © w +r’ +y +2? 0 (mod m). 


The equation in Lemma 1 can therefore be divided through by m?. The result is an expression 
for np as a sum of four squares. Since 0 < n < m, the proof is complete. 


Remark: Lemma 3 can be improved: it is enough for p to be anlodd number not necessarily 
prime. But that stronger statement requires a longer proof. 
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Chapter 138 


11P81 — Elementary theory of 
partitions 


138.1 pentagonal number theorem 


Theorem : m m 
[[@-2*) = So (<1) (138.1.1) 
k=1 n=— 00 


where the two sides are regarded as formal power series over Z. 


Proof:For n > 0, denote by f(n) the coefficient of x” in the product on the left, i.e. write 


[[a-2 = Yost 
k=1 
By this definition, we have for all n 


f(n) = e(n) — d(n) 


where e(n) (resp. d(n)) is the number of |partitions| of n as a sum of an even (resp. odd) 
number of distinct summands. To fix the notation, let P(n) be set of pairs (s,g) where 
s is alnatural number| > 0 and g is a decreasing [mapping] {1,2,...,s} — N* such that 
>>, 9(z) =n. The|cardinall of P(n) is thus f(n), and P(n) is the funion| of these two [disjoint] 


sets: 
(s,g) E€ P(n) | s is even}, 


(s,g) € P(n) | s is odd}. 


Now on the right side of (38.LI) we have 


1+ renee OCI \ grn- 1)/2 
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Therefore what we want to prove is 


e(n) = d(n)+(-1)” if n = m(3m + 1)/2 for some m (138.1.2) 
e(n) = d(n) otherwise. (138.1.3) 
For m > 1 we have 
m(3m + 1)/2 = Im+(2m—1)+...4(m+1) (138.1.4) 
m(3m — 1)/2 = (2m-—1)+ (2m -2)+...+m (138.1.5) 


Take some (s, g) € P(n), and suppose first that n is not of the form (38.14 nor (38.1.5). 
Since g is decreasing, there is a unique k € [1, s] such that 


gH) =91)-F+1  forgell, kg) <gG)-j+1 forje[k+ls] 
If g(s) < k, define F: [1, s — 1] — N* by 
q(x) = g(x) +1, if az € [1,9(s)], 
g(t), fw € [g(s)+1,9—1. 
If g(s) > k, define F: [1, s + 1] — N* by 
g(x) a 1, if x € [1, k], 
giz) = 4 g(x), if z € [k +1,5], 
k; ifr=s+l1. 


In both cases, g is decreasing and }>,, g(x) = n. The mapping g — g maps takes an element 
having odd s to an element having even s, and vice versa. Finally, the reader can verify that 


g= g. Thus we have constructed a|bijection| E(n) > D(n), proving (38-13). 


Now suppose that n = m(3m + 1)/2 for some (perforce unique) m. The above construction 
still yields a bijection between E(n) and D(n) excluding (from one set or the other) the 
single element (m, go): 

gols) =2m+1-—2 for x € [1, m] 


as in (38.1.4). Likewise if n = m(3m — 1)/2, only this element (m, g1) is excluded: 
g(z)=2n—-—2 forz € [1,m] 


as in (88.1.5). In both cases we deduce (38.1.2), completing the proof. 


Remarks:The name of the theorem derives from the fact that the exponents] n(3n + 1)/2 


are the generalized pentagonal numbers 


The theorem was discovered and proved by Euler around 1750. This was one of the first 
results about what are now called thetalfunctions, and was also one of the earliest applications 


of the formalism of generating functions 


The above proof is due to F. Franklin, (Comptes Rendus de |’Acad. des Sciences, 92, 
1881, pp. 448-450). 
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Chapter 139 


11R04 — Algebraic numbers; rings of 
algebraic integers 


139.1 Dedekind domain 


A Dedekind domain is an [integral domain] R for which: 


e Every ideal in R is|finitely generated 
e Every nonzero prime ideal is a maximal idea 
e The domain! R is integrally closed in its field of fractions 


It is worth noting that the clause ” every |[prime|is [maximal]! implies that the maximal length 
of a strictly increasing) chain] of prime ideals is 1, so the [Krull dimension! of any Dedekind 


domain is 1. In particular, the affine [ring] of an algebraic set) is a Dedekind domain if and 
only if the set is and 1-dimensional. 


Every Dedekind domain is a 
If K isa then Ox, the ring of algebraic integers of K, is a Dedekind domain. 
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139.2 Dirichlet’s unit theorem 


Let K be afnumber field) Ox be its|ring of integers) Then 
Ok Suk x Z, 
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Here u( K) is the of the in O%, r is the number of|real embeddings] 
K — R, and 2s is the number of non-real K — C (they occur in 
complex conjugate pairs, so s is an integer). 


Version: 3 Owner: sucrose Author(s): sucrose 


139.3 Eisenstein integers 


Let p = (—1+ vyv —3)/2, where we arbitrarily choose yv —3 to be either of the 
whose pquare]is —3. The Eisenstein integers are the [ring] Z[p] = {a + bp: a,b € Z}. 
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139.4 Galois representation 


In general, let K be any feld) Write K for a of K, and Gx for the 
[absolute Galois groupi r (((Q)K/K) of K. Let A bea abelian! topological group 
Then an (A-valued) Galois representation for K is a|lcontinuous! homomorphism 


p: Gg > Aut(A), 


where we endow Gx with the Krull topology, and where Aut(A) is the of contin- 
uous of A, endowed with the compact-open topology, One calls A the 
space for p. 


The simplest case is where A = C”, the group of n x 1 column _vectors| with |complex| entries. 
Then Aut(C”) = GL,(C), and we have what is usually called a complex representation of 
degree) n. In the same manner, letting A = F”, with F any field (such as R or a finite field) 


F,) we obtain the usual definition of a degree n representation over F. 


There is an alternate definition which we should also mention. Write Z|G'x] for the 
of G with coefficients in Z. Then a Galois representation for K is simply a continuous 
Z|G |-module A. In other words, all the information in a representation p is preserved in 
considering the representation space A as a continuous Z|Gx|-module. The equivalence] of 


these two definitions is as described in the entry for the 


When A is the continuity requirement is to the action of Z|Gg] on M 
naturally extending to a Z[Gx]|-module structure on M. The notation Z[Gx] denotes the 


completed group ring: 
Z|[G] = limZ/G/H], 
H 
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where G is any profinite group, and H \rangesj over all normal subgroups] of 


A notation we will be using often is the following. Suppose p: Gg — Aut(A) is a represen- 


tation, and H CG a [subgroup] Then we let 
AY = {a € A | p(h)a =a, for all h € H}, 


the subgroup of A |fixed|pointwise| by H. 


Given a Galois representation p, let Go = ker p. By the fundamental theorem of infinite Galois theory, 
since Gp is a normal subgroup of Gx, it corresponds to a certain 


of K. Naturally, this is the of Go, and we denote it by K(p). (The notation 
becomes better justified after we view some examples.) Notice that since p is trivial on 
Go =T(({Q)K/K/(p)), it factors] through a representation 


P: T((Q)K(p)/K) => Aut(A), 


which is [faithfull This characterizes K (p). 


In the case A = R” or A = C”, the so-called ”no small subgroups” argument implies that 


the image of Gx is finite. 


For a first application of definition, we say that p is discrete if for all a € A, the|stabilizer! of 
ain Gx isfopenjin Gx. This is the case when A is given the |discrete topology, such as when 
A is finite and Hausdorff. The stabilizer of any a € A fixes a finitelextension| of K, which we 
denote by K(a). One has that K(p) is the {union| of all the K (a). 


As a second application, suppose that the image p(Gx) is Abelian. Then the quotient 
Gx /Go is Abelian, so Go (contains) the commutator subgroup] of Gx, which means that K(p) 
is contained in K??, the of K. This is the case when p is a 
i.e. a 1-dimensional representation over some 


Ce GLA 


Associated to any field K are two basic Galois representations, namely those with represen- 
tation spaces A = L and A = L*, for any normal intermediate field K C L C K, with the 
usual action of the (Galois group) on them. Both of these representations are discrete. The 
additive] representation is rather [simple] if L/K is finite: by the normal [basis] theorem, it is 
merely a/permutation| representation on the normal basis. Also, if L = K and x € K, then 
K(x), the field obtained by adjoining x to K, agrees with the fixed field of the stabilizer of 
x in Gg. This motivates the notation “K(a)” introduced above. 
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By contrast, in general, L* can become a rather complicated object. To look at just a 
piece of the representation L*, assume that L contains the group Um of m-th roots of unity| 
where m is to the of K. Then we let A = um. It is possible to 
choose an isomorphism] of Abelian groups um = Z/m, and it follows that our representa- 
tion is p: Gg — (Z/m)*. Now assume that m has the form p”, where p is a prime not 
equal to the characteristic, and set A, = ppr. This gives a [sequence] of representations 
Pn: Gx — (Z/p")*, which are compatible with the natural] maps) (Z/p"*")* — (Z/p")*. 
This compatibility allows us to glue them together into a big representation 


p: Gx > Aut(T,G,,) = Z%, 


called the p-adic cyclotomic representation of K. This is representation is often not discrete. 
The notation T,Gm will be explained below. 


This example may be generalized as follows. Let B be an Abelian [algebraic] group defined 
over K. For each jinteger|n, let B, = B(K)[p"] be the set of K-rational points whose [order] 
divides p”. Then we define the p-adic Tate module of B via 


T,B = lim By. 


n 


It acquires a natural Galois action from the ones on the B,,. The two most commonly treated 


examples of this are the cases B = Gm (the group, giving the cyclotomic 
representation above) and B = E, an elliptic curve| defined over K. 


The last thing which we shall mention about generalities is that to any Galois representation 
p: Gx — Aut(A), one may lassociate| the Galois cohomology groups H”(K, p), more com- 
monly written H”(K, A), which are defined to be the group cohomology of Gx (computed 


with continuous cochains) with coefficients in A. 


Galois representations play a fundamental role in as many objects 
and properties related to global fields and {local fields! may be determined by certain Galois 
representations and their properties. We shall describe the local case first, and then the 
global case. 


Let K be a local field, by which we mean the fraction field! of a complete [DVR] with finite 
We write vg for the normalized valuation, Ox for the associated DVR, mg for 
the maximal ideal of Ox, kx = Ox«/myx for the residue field, and £ for the characteristic of 
kr. 


Let L/K be a finite , and define vz, Oz, mz, and ky accordingly. There 
is a natural |surjection| [((|Q)L/K) — T((|Q)k./kx). We call the {kernel] of this map the 
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and write it [(L/K) = ker (T((|Q)L/K) — T((|Q)kr/kx)). Further, the p- 
Sylow subgroup of I(L/K) is normal, and we call it the wild ramification) group, and denote 
it by W(L/K). One calls I/W the tame ramification group. 


It happens that the formation of these group is compatible with extensions L’/L/K, in 
that we have surjections I(L'/K) — I(L/K) and W(L'/K) — W(L/K). This lets us 
define Wg C Ix C Gr to be the [inverse limits] of the subgroups W(L/K) C I(L/K) € 
T'((|Q)L/K), L as usual ranging over all finite Galois extensions of K in K. 


Let p be a Galois representation for K with representation space A. We say that p is 
ramified if the inertia group Ix acts trivially on A, or in other words I C ker p or A/* = A. 
Otherwise we say it is unramified. Similarly, we say that p is (at most) tamely ramified if 
the wild ramification group acts trivially, or Wx C ker p, or AWx = A; and if not we say it 
is wildly ramified. 


We let Ay, = K'* be the maximal unramified extension of K , and Kiame = K* be the 
maximal tamely ramified extension of K. 


Unramified or tamely ramified extensions are usually much easier to study than wildly ram- 
ified extensions. In the unramified case, it results from the fact that Gx/In = Grew) = Z 
is pro-cyclic. Thus an unramified representation is completely determined by the action of 
p(c) for a topological |generator|o of Gg/Ig. (Such a is often called a Frobenius element.) 


Given a L/K, one defines the inertia degree fryx = [kz : kx] and the 
ramification degree ezg = [vz(L*) : vz(K*)] as usual. Then in the Galois case one may 
recover them as fz;x = [F(([Q)L/K) : I(L/K)] and ez;x = #I(L/K). The tame inertia 
degree, which is the non-p-part of ezjg, is equal to [J(L/K) : W(L/K)], while the wild 
inertia degree, which is the p-part of ezg, is equal to #W(L/K). 


One finds that the inertia and ramification properties of L/K may be computed from the 
ramification properties of the Galois representation Or. 


We now turn to global fields. We shall only treat the number field| case. Thus we let K be 
a finite extension of Q, and write Ox for its [ring of integers) For each place v of K, write 
K, for the completion] of K with respect to v. When v is a finite place, we write simply v 
for its associated normalized valuation, O, for Ox,, m, for mx,, ky for kx,, and (v) for the 
characteristic of ky. 


For each place v, fix anjalgebraic closure] K, of K,. Furthermore, choose an{embedding| K — 
K,. This choice is equivalent to choosing an extension of v to all of K”, and to choosing an 
embedding Gg, — Gg. We denote the image of this last embedding by G, C Gx; it is called 
a at v. Sitting inside G, are two groups, I, and W,, corresponding to 
the inertia and wild ramification subgroups Ix, and Wx, of Gx; we call the images J, and 
W, the inertia group at v and the wild ramification group at v, respectively. 
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For a Galois representation p: Gx — Aut(A) and a place v, it is profitable to consider 
the restricted) representation p, = p|q,. One calls p a global representation, and p, a local 
representation. We say that p is ramified or tamely ramified (or not) at v if p, is (or 


isn’t). The Tchebotarev density theorem implies that the corresponding Frobenius elements 


Oy € G, are in Gx, so that the union of the G, is dense in Gx. Therefore, it is 
reasonable to try to reduce questions about p to questions about all the p, independently. 
This is a manifestation of Hasse’s local-to-global principle. 


Given a global Galois representation with representation space Z which is unramified at 
all but finitely many places v, it is a goal of number theory to prove that it arises naturally 


in (namely, as a of an étale group of a 


motive), and also to prove that it arises from an automorphic form. This can only be shown 
in certain special cases. 
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139.5 Gaussian integer 


A of the form a+ bi, where a,b € Z, is called a Gaussian integer. 


It is easy to see that the set S of all Gaussian integers is a|subring of C; specifically, © is the 
smallest subring containing {1,7}, whence S = G. 


G is a Euclidean ring, hence a principal ring, hence a unique factorization domain 


There are four (i.e. invertible elements) in the ring G, namely +1 and +i. Up to 
multiplication by units, the primes] in G are 


e ordinary [prime numbers 3 (mod#1) 


e elements of the form a + bi where a? + b? is an ordinary prime = 1 (mod#1) (see 


‘Thue’s lemma) 


e the element 1 +7. 


Using the ring of Gaussian integers, it is not hard to show, for example, that the Diophantine equation 
x? +1 = y’? has no solutions (x,y) € Z x Z except (0,1). 
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139.6 algebraic conjugates 


Let L be an of a K, and let a, € L be over K. Then 
a is the [root] of a [minimal polynomial] f(x) € K[x]. Denote the other roots of f(x) in L 


by Q2, Q3,...,Qn. These are the algebraic conjugates of a; and any two are said to be 
algebraically conjugate. 


The notion of algebraic conjugacy is a special case of conjugacy in the case where the 
group in question is the Galois group of the above minimal polynomial, viewed as acting on 
the roots of said polynomial) 
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139.7 algebraic integer 


Let K be an [extension] of Q. A number a € K is called an algebraic integer of K it is the 


[root] of a [monic polynomial] with coefficients in Z, i.e., an element of K that is [integral] over 
Z.. Every algebraic integer is an [algebraic number) (with K = C), but the converse is false. 
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139.8 algebraic number 
A number a € C is called an algebraic number if there exists a [polynomial] f(a) = anx” + 


--- + ao such that ao, ...,an, not all zero, are in Q and f(a) = 0. 
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139.9 algebraic number field 


A K C C is called an algebraic number field if its [dimension] over Q is finite! 
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139.10 calculating the splitting of primes 


Let K|L be anlextension|of number fields, with rings of integers|Ox, Oz. Since this extension 


is separable) there exists a € K with L(a) = K and by multiplying by a suitable [integer] we 
may assume that a € Ox (we do not require that Oza] = Ox. There is not, in general, an 


a € Oz with this |property). Let f € O;[x] be the minimal polynomial] of a. 


Now, let p be a|prime ideal] of L that does not [divide] A(f)A(Ox)~', and let f € OL/pO_[z] 
be the reduction of f mod p, and let f = f,---f, be its factorization into 


If there are repeated factors, then p splits in K as the product 
p= (p, fi(a)) wok (p, fala)), 


where f; is any polynomial in Oz[z] reducing to f;. Note that in this case p is 
since all f; are pairwise |coprime mod p 


For example, let L = Q, K = Q(vd) where d is a [square-free] integer. Then f = 2? — d. For 
any |prime|p, f is irreducible mod p if and only if it has nolrootslmod p, i.e. d is a quadratic 


non-residue mod p. Using quadratic reciprocity, we can obtain acongruence] condition mod 
4p for which primes split and which do not. In general, this is possible for all [fields| with 


using class field theory. 


Furthermore, let A’ be the splitting field of L. Then G = Gal(A’|L) acts on the roots of f, 
giving ajmap|G — Sm, where m = deg f. Given a prime p of Oz, the|Artin symbol [P, K'|L] 


for any P lying over p is determined up to conjugacy by p. Its image in Sn is a product of 


of length mı, ..., Mp where m; = deg f;. This information is useful not just 
for prime splitting, but also for the calculation of Galois groups. 


Another useful fact is the Frobenius density theorem, which states that every element of G 
is [P, K'|L] for infinitely many primes P of Ox:. 


For example, let f = x? + x? +2 € Z[z]. This is irreducible mod 3, and thus irreducible. 


tells us that G = Gal(K'|L) is a[subgroup] of $3, and so is[isomorphic|to C3 or 
S3, but it is not obvious which. But if we consider p = 7, f = (x —2)(x? +3x— 1) (mod 7), 


and the quadratic factor is irreducible mod 7. Thus, G & S3. 


Or let f = xf + az? + b for some integers a,b and is irreducible. For a prime p, consider the 
factorization of f. Either it remains irreducible (G contains) a 4-cycle), splits as the product 
of irreducible quadratics (G contains a cycle of the form (12)(34)) or f has a root. If 8 is 
a root of f, then so is — 8, and so assuming p ¥ 2, there are at least two roots, and so a 
3-cycle is impossible. Thus G & Cy or D4. 
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139.11 characterization in terms of prime ideals 


Let R be a\Dedekind domain! and let J be an lideallof R. Then there exists an ideal J in R 
such that IJ is principal} 
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139.12 ideal classes form an abelian group 


As previous, define C as the set of with multiplication - defined by 
[a] - [b] = [ab] 

where a, b are ideals) of Ox. 

We shall check the [group properties} 


i; [a]: ([6]-[c]) = [a]- [be] = [a(be)] = [abe] = [(ab)e] = av]: [e] = (la) 16) -te 
2. [Ox]: [6] = [6] = [6] - (Ox 


3. Consider [b]. Let b be an [integer] in b. Then b D (b), so there exists ¢ such 
that bc = (b). 
Then the ideal class [b] - [c] = [(6)] = [Ox]. 


Then € is a group under the operation -. 


It is abelian| since [a][b] = [ab] = [ba] = [6] [al. 
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139.13 integral basis 


Let K be a\number field) A set of lalgebraic integers {a1,...,Qs} is said to be an integral 
basis for K if every y in Ox can be represented uniquely as an of 
{Q1,...,Q@s} (i.e. one can write y = mya, +---+m,a, with mi, ..., Ms (rational) integers). 


If I is an ideal of Ox, then {ai,...,as} € I is said to be an integral basis for I if every 
element of J can be represented uniquely as an integer linear combination of {a1,..., as}. 


(In the above, Ox denotes the [ring] of algebraic integers of K.) 
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An integral basis for K over Q is a|basis| for K over Q. 
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139.14 integrally closed 


A|subring} R of alring} S is said to be integrally closed in S if whenever 0 € S and 9 is|integral] 
over R, then 0 € R. 


The integral closure of R in S is integrally closed in S. 
A ring R is said to be integrally closed (or{normal) if it is integrally closed in its [fraction field) 
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139.15 transcendental root theorem 


Suppose a a is transcendental] over some {field F. Then \/z is also transcendental 
over F. Informally, this theorem is true because if yx were algebraic, then we could take 


its minimal polynomial, (group) the {terms) into odd and even {powers} and then show that x is 


also algebraic over F, a contradiction. 


In fact this theorem is true for 3rd{[roots) 4th roots, 5th roots, ..., etc, but the proof 


is somewhat more involved. 


Version: 4 Owner: kidburla2003 Author(s): kidburla2003 


662 


Chapter 140 


11R06 — PV-numbers and 
generalizations; other special 
algebraic numbers 


140.1 Salem number 


Salem number is a a > 1 whose algebraic conjugates) all lie in the 
funit_disk] { z € C | |z| < 1} with at least one on the [unit circle! { z € C | |z| = 1}. 


(Powers! of a Salem number a” (n = 1,2,...) are everywhere modulo 1, but are not 
uniformly distributed modulo 1. 


The smallest known Salem number is the largest of 


a +a a a a a a’ t+a+1=0. 
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Chapter 141 


11R11 — Quadratic extensions 


141.1 prime ideal decomposition in quadratic exten- 
sions of Q 


Let K be a quadratic number field| i.e. K = Q(vd) for some d. The 
discriminant] of the lextensionl is 


d, ifd<#1 mod 4, 
Drk = . 
4d, if d <= 2,3 mod 4. 


Let Ox denote the of K. We have: 
7 i Liva7, if d 1 mod 4, 
Og S 


2 


Z @ VdZ, if d= 2,3mod 4. 


of Z decompose as follows in Ox: 
Theorem 10. Let p € Z be a\prime, 


1. Ifp |d (divides), then pOx = (p, Vd)?; 
2. Ifd is odd, then 
(2,14+ vd}, ifd 3 mod 4, 
20k = (2, 44) (2, 1y) , ifd&1 mod 8, 
prime, ifd <5 mod 8. 


3. If p £2, p does not divide d, then 
9 pea e ifd & n? mod P, 
PIK = 


prime, ifd is not prime mod p. 
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Chapter 142 


11R18 — Cyclotomic extensions 


142.1 Kronecker-Weber theorem 


The following theorem classifies the possible Abelian extensions of Q. 


Theorem 11 (Kronecker-Weber Theorem). Let L/Q be a\finite| abelian extension, then 
L is contained in a cyclotomic extension, i.e. there is a root of unity ¢ such that L C Q(¢). 


In a similar fashion to this result, the of elliptic curves) with complex multiplication 


provides a classification of abelian extensions of quadratic imaginary number fields: 


Theorem 12. Let K be a quadratic \imaginary number field with ring of integers Og. Let 


E be an elliptic curve with complex multiplication by Ox and let j(E) be the j-invariant of 
E. Then: 


1. K(g(E)) is the Mr class Fala of K. 
2. If j(E) # 0,1728 then the[mazimal abelian extension of K is given by: 


K” = K(j(£), fil Erorsion) ) 
where h(Etorsion) is the set of x-coordinates of all the {torsion points of E. 
Note: The h: E — C is called a Weber function for E&E. We can define a Weber 


function for the cases j(£) = 0,1728 so the theorem holds true for those two cases as well. 
Assume E: y? = r? + Ax + B, then: 
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142.2 examples of regular primes 


Examples: 


1. These are all the [irregular primes) up to 1061: 
37,09,67,101,103,131,149,157,233,257,263,271,283,293,307, 311,347,393,379,389.401, 
409,421,433,461,463,467,491,523,541,547,557,577,587,593, 607,613,617,619,631,647, 
653,659,673,677,683,691,727,751,757,761,773,797,809,811, 821,827,839,877,881,887, 
929,953,971,1061. 


(for this, see the On-Line Encyclopedia of Integer Sequences; | sequence A000928) 


2. The following are the first few class numbers of the cyclotomic Q(Cp), where Cp 
is a primitive p-th|root of unity} 
Class Number 
3 1 


1 
1 
1 
1 
1 
1 
3 
8 
9 


Remarks: 
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e Notice that 37 (divides) 37, and 59 divides 41241 = 3 - 59 - 233, thus 37,59 are 
irregular primes (see above). 


e The class number of the cyclotomic fields grows very quickly with p. For example, 
p = 19 is the last cyclotomic field of class number 1. 
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142.3 prime ideal decomposition in cyclotomic exten- 
sions of Q 


Let q € Z be a\primelgreater than 2, let ¢, = e?”/4 and write L = Q(¢,) for the|cyclotomic extension] 
The {ring of integers) of L is Oz = Z[¢,]. The [discriminant] of L/Q is: 

Drg = q7? 
and it is + exactly when q — 1 & 0,1 mod 4. 


Proposition 4. /£q € Q(¢,), with + exactly when q — 1 & 0,1 mod 4. 


I t can be proved that: 


Drg = q7? = I] es = Gs 


Taking [square roots) we obtain 
q=3 i ą 
q? Viq= [| (-) € QQ) 


1<i<j<p-1 
Hence the result holds (and the sign depends on whether q — 1 <= 0,1 mod 4). 


Let K = Q(,/£¢) with the corresponding sign. Thus, by the |proposition| we have a tower of 
L=QG) 


K 


Q 


For a pZ the in the quadratic [extension] K/Q is well-known (see 
this entry). The next theorem characterizes the decomposition in the extension L/Q: 


Theorem 13. Let p € Z be a prime. 
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1. Ifp =q, q0 = (1 — ae In other words, the prime q is totally ramified in L. 


2. Ifp £q then pZ splits into (p —1)/f distinct primes in Oz, where f is the [order] of 
p mod q (i.e. pf 41 mod q, and for alll <n < f,p" #1 mod q). 
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142.4 regular prime 


A p is if the (class number! of the cyclotomic [field Q(¢,) is not divisible by p 
(where ¢, := e?"/? denotes a primitive p* root of unity). An irregular prime is a prime that 


is not regular. 


Regular primes rose to prominence as a result of Ernst Kummer’s work in the 1850’s on 
Kummer was able to prove Fermat’s Last Theorem in the case where 
the is a regular prime, a result that prior to Wiles’s recent work was the only 
demonstration of Fermat’s Last Theorem for a large|class|of exponents. In the course of this 
work Kummer also established the following numerical criterion for determining whether a 
prime is regular: 


e pisregular if and only if none of the|numeratorslof the Bernoulli numbers Bo, B2, B4, . . . , Bp-3 


is a multiple of p. 


Based on this criterion it is possible to give a heuristic argument that the regular primes 
have e~'/2 in the set of all primes [I]. Despite this, there is no known proof that the 
set of regular primes is [infinite] although it is known that there are infinitely many irregular 
primes. 


REFERENCES 
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory, 
Springer-Verlag, New York, Second Edition, 1990. 
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Chapter 143 


11R27 — Units and factorization 


143.1 regulator 


Let K be a number field) with [K : Q] = n = rı + 2rə. Here rı denotes the number of 


ci: K OR, l<icnrn 
while rə is half of the number of 


Tj: Se’, 1l<j<re 


Note that {7;,7; | 1 <j < r2} are all the complex embeddings of K. Let r = rı + r2 and 
for 1 <7 < r define the “norm” in K corresponding to each embedding 


| - ll: K* > Rt 
la li=l oila) |, L<ign 
la llnt=l (a), 1<i<r 


Let Ox be thelring of integers|of K. By Dirichlet’s unit theorem) we know that the rank’ of 
the lunit|jgroup] O% is exactly r — 1 = rı + rə — 1. Let 


{€1, €2, a .)€r—1} 


be a fundamental system of of O% modulo roots of unity (this is, modulo the 
ltorsion|subgroup). Let A be the r x (r — 1) 


log || €1 lla log || €2 Ila... log || ea Il 
| log ll er lļl2 log || e2 |2 -log || er-1 Ile 
log || €1 ||, log || €2 ||, ... log || €-—1 Il. 


and let A; be the (r—1) x (r—1) matrix obtained by deleting the i-thfrowjfrom A, 1 <i <r. 
It can be checked that the {determinant of A;, det Aj, is independent] up to sign of the choice 


of fundamental system of generators of O% and is also independent of the choice of i. 
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Definition 5. The regulator of K is defined to be 


Regg =| det A; | 
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Chapter 144 


11R29 — Class numbers, class groups, 
discriminants 


144.1 Existence of Hilbert Class Field 


Let K be alnumber field) There exists alfinite extension! E of K with the following 


1. [E : K| = hg, where hx is the [class number! of K. 

2. E is Galois over K. 

3. The[ideal class group of K is isomorphic] to the [Galois group of E over K. 

4. Every lideall of Ox is a [principal ideal] of the ring) extension) Op. 

5. Every P of Ox decomposes into the product of us prime ideals in Op, 

where f is the lorder] of [P] in the ideal class group of Og. 

There is a unique [field] Æ satisfying the above five properties, and it is known as the Hilbert 
class field of K. 
The field Æ may also be characterized as the extension of K. 
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144.2 class number formula 


Let K be a number field) with [K : Q] = n = rı + 2r2, where rı denotes the number of 
‘real embeddings] of K, and 2r is the number of complex embeddings} of K. Let 


Cx(s) 
be the [Dedekind zeta function of K. Also define the following invariants! 


1. hx is the [class number\ the number of elements in the lideal class group) of K. 
2. Regg is the regulator’ of K. 


3. wxg is the number of contained in K. 
4. Dx is the discriminant] of the [extension] K/Q. 


Then: 


Theorem 14 (Class Number |formula). The Dedekind zeta function of K, ¢x(s)|converges absolutely 
for Re(s) > 1 and eatends to a\meromorphic|function| defined for Re(s) > 1—+ with only 
one simple pole at s = 1. Moreover: 


2"). (Q7)"- hg - Regg 


wk: y| Dr | 


lim(s —1)ex(s) = 


Note: This is the most general “class number formula”. In particular cases, for example 


when K is a cyclotomic extension of Q, there are particular and more refined class number 


formulas. 
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144.3 discriminant 


144.3.1 Definitions 


Let R be any [Dedekind domain! with {field of fractions) K. [Fix] a [finite dimensional] field) 


lextension) L/ K and let S denote the integral closure of R in L. For any (basis! x1, sery EnO 
L over K, the 


A(T1,.-., a) := det[Tr(x;x;)], 
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whose entries are the {trace| of x;2; over all pairs 7,7, is called the discriminant of the basis 
L1,..-,Xn. The lideal|in R generated by all discriminants of the form 

A(T1,..., En) ES 
is called the discriminant ideal of S over R, and denoted A(S/R). 


In the special case where S is a [free R-module, the discriminant ideal A(S/R) is always a 
generated by any discriminant of the form A (z1, ..., £n) where z1,..., £n is 


a basis for S as an R-module. In particular, this situation holds whenever K and L are 


nhumber Meas 


144.3.2 Properties 


The discriminant is so named because it allows one to determine which ideals of R are 


ramified in S. Specifically, the |prime ideals) of R that ramify in S are precisely the ones that 
contain’ the discriminant ideal A(S/R). In the case R = Z, Minkowski’s theorem! states) that 


any ring of integers S of a number field larger than Q has discriminant strictly smaller than 
Z itself, and this fact combined with the previous result shows that any number field K 4 Q 


admits at least one ramified over Q. 
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144.4 ideal class 


Let K be alnumber field) Let a and 6 be ideals in Ox (the [ring] of [algebraic integers] of K). 
Define a ~ on the ideals of Ox in the following way: write a ~ b if there exist 
nonzero elements a and 8 of Ox such that (a)a = (8)b. 


The relation ~ is an equivalence relation, and the equivalence classes under ~ are known as 


ideal classes. 


The number of equivalence classes, denoted by h or hx, is called the class number of K. 


Note that the set of ideals of any ring R forms an/Abelian semigroup with the 
as the semigroup) operation. By replacing ideals by ideal classes, it is possible to define a 
eToup)|on the ideal classes of Ox in the following way. 


Let a, b be ideals of Ox. Denote the ideal classes of which a and b are representatives by [a] 
and [b] respectively. Then define - by 


[a] - [b] = [ab] 


Let € = {[a] | a # (0), a an ideal of Ox}. With the above definition of multiplication, C is 
an abelian group, called the ideal class group of K. 
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Note that the ideal class group of K is simply the quotient group of the ideal group) of K by 
the subgroup] of principal fractional ideals 
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144.5 ray class group 


Let m be a [modulus] for a number field! K. The ray class group of K mod m is the 
m/Km,1, where 


e I™ is the of the ideal group of K generated by all which do not 


occur in the factorization of m. 


e Km, is the subgroup of I™ consisting of all principal ideals in the ring of integers of K 
having the form (a) where a is multiplicatively congruent to 1 mod m. 
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Chapter 145 


11R32 — Galois theory 


145.1 Galois criterion for solvability of a polynomial 
by radicals 


Let f € F [2] be a{polynomial] over a [field] F, and let K be its Then K is a 
radical extension] if and only if the {Galois group| Gal(K/F) is a/solvable group 
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Chapter 146 


11R34 — Galois cohomology 


146.1 Hilbert Theorem 90 


Let L/K be a [finite Galois extension] with [Galois group) G = Gal(L/K). Then the first 
(Galois cohomologyigroup, H'(G, L*) is 0. 


A corollary (and the actual result that Hilbert called his Theorem 90) is that, if G is [cyclic] 
with [generator] o, then z € L has orm) 1 if and only if 


«= y/oly) 
for some y € L. 
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Chapter 147 


11R37 — Class field theory 


147.1 Artin map 


Let L/K be a\Galois extension) of number fields, with [rings of integers| Oz and Ox. For any 
finite prime] P C L lying over a [prime] p € K, let D(P) denote the decomposition group| 
of P, let T(P) denote the [inertia group] of P, and let 1 := O,/P and k := Ox/p be the 
The [exact sequence] 


1 — T(P) — D(P) — T (((Q)/k) — 1 


yields an [isomorphism] D(P)/T(P) = T((|Q)l/k). In particular, there is a unique element 
in D(P)/T(P), denoted [L/K,P|, which maps) to the g" power) Frobenius map| Frob, € 
T((|Q)l/k) under this isomorphism (where q is the number of elements in k). The notation 
[L/K, FP] is referred to as the Artin symbol of the [extension] L/K at P. 


If we add the additional assumption that p is then T(P) is the trivial 
and [L/K,9] in this situation is an element of D(P) c TP((|Q)L/K), called the Frobenius 
automorphism of P. 


If, furthermore, L/K is an\Abelian extension] (that is, [((|Q)L/K) is anjabelian group), then 
[L/K,P] = [L/K, P| for any other prime P C L lying over p. In this case, the Frobenius 
automorphism [L/K, P| is denoted (L/K, p); the change in notation from P to p reflects the 
fact that the ‘automorphism is determined by p € K lindependent] of which prime P of L 


above it is chosen for use in the above construction. 


Definition 4. Let S be alfinitelset of primes of K, containing all the primes that ramify in 
L. Let I S denote the [subgroup] of the group Ix. of of K which is generated 
by all the primes in K that are not in S. The Artin map 


ỌLIK : i —" r(((IQ)L/K) 


is the map given by $7/x(p) := (L/K, p) for all primes p ¢ S, extended linearly to I>. 
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147.2 'Tchebotarev density theorem 


Let L/K be any of nmmbar a G. For any 
C C G, the [subset] of [prime ideals) p C K which are unramified| in L and 
satisfy the |property 


[L/K,%] € C for any prime P C L containing p 
has al, where [L/K,%] denotes the {Artin symbol] at %. 
Note that the conjugacy class of [L/K, %9] is independent) of the choice of $ lying 
over p, since any two such choices of primes are related by a Galois automorphism] and their 
corresponding Artin symbols are by this same automorphism. 
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147.3 modulus 


A modulus for ainumber field) K is a formal product 
IIe” 
p 
where 


e The product is taken over all [finite primes) and [infinite primes) of K 
The Np are nonnegative [integers] 

e All but finitely many of the n, are zero 

For every real prime  p, the exponent ny is either 0 or 1 

For every complex prime] p, the exponent ny is 0 


A modulus can be written as a product of its [finite] part 


II o” 


p finite 
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and its part 


I] °”: 


p real 


with the finite part equal to some ideal] in the Ox of K, and the infinite part 


equal to the product of some subcollection of the real primes of K. 
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147.4 multiplicative congruence 


Let p be any [real prime| of a number field) K, and write i : K — R for the corresponding 


‘real embedding of K. We say two elements a, 3 € K are multiplicatively congruent mod p if 
the real numbers) i(a) and 7(3) are either both positive or both negative. 


Now let p be alfinite prime of K, and write (Ox), for the localization] of the 
Ox of K at p. For any natural number|n, we say a and are multiplicatively congruent mod 


p” if they are members of the same [coset] of the [subgroup] 1 + p"(Ox)» of the [multiplicative] 
(group) K* of K. 


If m is any modulus for K, with factorization 
m=] [p”, 
p 


then we say a and 8 are multiplicatively congruent mod m if they are multiplicatively con- 
gruent mod p% for every [prime] p appearing in the factorization of m. 


Multiplicative congruence of a and 3 mod m is commonly denoted using the notation 


a<<=* G6 (mod m). 
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147.5 ray class field 


Proposition 5. Let L/K be a/finite| Abelian extension of number fields, and let Ox be the 
of K. There exists an C C OK, divisible by precisely the 
of K that |ramify] in L, such that 


((a),L/K)=1, Vae K*, a% 1 mod € 
where ((a), L/K) is the|Artin map| 
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Definition 6. The conductor of a finite abelian extension L/K is the largest ideal Cz/x C 


Ox satisfying the above 
Note that there is a “largest ideal” with this condition because if 1 is true for 
C1, C then it is also true for C, + Co. 


Definition 7. Let J be an integral ideal of K. A ray class field of K (modulo J) is a finite 
abelian extension K/K with the property that for any other finite abelian extension L/K 
with conductor Cy/K, 

CLK | J> LCK 


Note: It can be proved that there is a unique ray class field with a given conductor. In words, 
the ray class field is the biggest abelian extension of K with a given conductor (although 
the conductor of K; does not necessarily equal J !, see example 2). 


Remark: Let p bea of K unramified in L, and let $8 be a prime above p. Then 
(p, L/K) = 1 if and only if the extension of residue Heldslis of legred| 1 


[OL/PB: Ox/p| =1 


if and only if p splits completely in L. Thus we obtain a characterization of the ray class 
field of conductor € as the abelian extension of K such that a prime of K splits completely 
if and only if it is of the form 


(a), ae Kk*,a@1 mod € 


Examples: 


1. The ray class field of Q of conductor NZ is the N“’-cyclotomic extension of Q. More 
concretely, let Cy be a primitive N“ Then 


Qnz = Q(¢n) 


QA) = Qi) 
so the conductor of Q(2) 2) /Q is (1). 


3. Ko, the ray class field of conductor (1), is the {maximal] abelian extension of K which 
is unramified everywhere. It is, in fact, the Hilbert class field of K. 


REFERENCES 
1. Artin/Tate, Class Field Theory. W.A.Benjamin Inc., New York. 
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Chapter 148 


11R56 — Adele rings and groups 


148.1 adle 


Let K be alnumber field) For each finite prime|v of K, let o, denote the valuation ring of the 
‘completion K, of K at v. The adéleygroup\A x of K is defined to be the\restricted direct product] 
of the [collection] of additive| groups {K,} over all|primes|v of K (both finite 
primes and infinite primes), with respect to the collection of {0} 


defined for all finite primes v. 


The set Ax inherits addition and multiplication operations (defined pointwise) which make 
it into a|topological ring) The original [field] K embeds as a|ring| into Ax via the (map| 


TH [le 
v 


defined for x € K, where x, denotes the |image] of x in K, under the embedding K > K,. 


Note that x, € 0, for all but finitely many v, so that the element x is sent under the above 
definition into the restricted direct product as claimed. 


It turns out that the image of K in Ax is a discrete set and the |quotient group Axg/K isa 
compact space in the quotient topology 
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148.2 idle 


Let K be amnumber field) For each [finite prime] v of K, let o, be the [valuation ring) of the 
completion] K, of K at v, and let U, be the group) of lunitslin o,. Then each group U, is a 
compact open) subgroup of the group of units Ky of K,. The idéle group Ix of K is defined 
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to be the restricted direct product) of the groups {K>} with respect to the 
compact open subgroups {U,}, taken over all finite primes and [infinite primes|v of K. 


The units K* in K embed into Ix via the diagonal embedding) 
Lh IEZ 


where z, is the of x under the embedding K — K, of K into its completion K,. 
As in the case of the group K* is a discrete subgroup of the group of idèles Ix, but 
unlike the case of adéles, the \quotient group| Ix/K* is not a compact group. It is, however, 
possible to define a certain subgroup of the idèles (the subgroup of mormi 1 elements) which 
does have compact quotient under K*. 


Warning: The group Ix is a multiplicative subgroup of the of adéles Ax, but the 


topology] on Ix is different from the {subspace topology) that Ig would have as a subset! of 
Ax. 
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148.3 restricted direct product 


Let {Gy bey be alcollection| of topological groups, For all but finitely many 


v EV, let H, C G, be a {compact /openy subgroup| of G,. The restricted direct product of the 
collection {G,,} with respect to the collection {H,} is the subgroup 


G = {ides € I] Gy 


vEV 


gv € H, for all but finitely many v € v} 


of the lev Cv- 
We define a [topology| on G as follows. For every S C V that all the 


elements v for which H, is undefined, form the topological group 
Gs = I] Gy X I] H, 
ves vés 


consisting of the direct product of the G,’s, for v € S, and the H,’s, for v ¢ S. The 
topological group Gg is a subset of G for each such S, and we take for a topology on G the 


weakest topology such that the G's are open subsets of G, with the subspace topology on 
each Gs equal to the topology that G's already has in its own right. 
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Chapter 149 


11R99 — Miscellaneous 


149.1 Henselian field 


Let || be a non archemidean valuation|on K. Define the set V := {x : |x| < 1}. We can see 
that V is closed| under addition as || is an ultra [metric] and infact V is an [additive group} 
The other valuation axioms] ensure that V is alring, We call V the valuation ring of K with 
respect to ||. Note that the [field of fractions| of V is K. 


Let u := {x : |x| < 1}. It is easy to show that this is almaximal ideall of V. Let R := V/p 
be called the residue field. 


The res: V — V/u given by z + «x + n is called the map. We extend the 
definition of the residue map to {Sequences} of elements from V, and hence to V[X] so that if 
f(X) € V[X] is given by JX ;<n i X* then res.) € R[X] is given by Siicn res{i) Xi. 


Hensel Property: Let f(x) € Vx]. Suppose res\.)() has a|simple rootļe € k. Then f(z) 
has alrootie’ € V and res(e’) = e. 


Any valued [field] satisfying hensels property shall be called henselian. The of 
a non archemidean valued field K with respect to the valuation (cf. constructing the 


from the|rationals|as the completion with respect to the|standard_metric) is a henselian field. 


Every non archemedian| valued field K has a unique (up to isomorphism) smallest henselian 
field K” containing it. We call K” the henselisation of K. 
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149.2 valuation 


Let K bea A valuation on K is a\function||- | : K — R satisfying the 


1. |z| > 0 for all x € K, with equality if and only if x = 0 
2. |xy| = |z|-|y| for allz,y € K 


3. |x+y| < |2| + ly! 


If a valuation jc +y| < max(|z],|y]), then we say that it is a non—archimedean 
valuation. Otherwise we say that it is an archimedean valuation. 


Every valuation on K defines a metric|on K, given by d(x,y) := |x — y|. This metric is an 
lultrametric| if and only if the valuation is non—archimedean. Two valuations are 
if their corresponding metrics induce the same on K. An v of 
valuations on K is called a prime of K. If v consists of archimedean valuations, we say that 
v is an infinite prime, or archimedean prime. Otherwise, we say that v is a finite prime, or 
non—archimedean prime. 


In the case where K is a primes as defined above generalize the notion of 
in the following way. Let p C K be a nonzero prime ideal, considered as a 
fractional ideal. For every nonzero element x € K, let r be the unique such that 
x € p" but z € p’*!. Define 


where N(p) denotes the absolute norm) of p. Then |- |, is a non—-archimedean valuation on 
K, and furthermore every non-archimedean valuation on K is equivalent to |- |p for some 
prime ideal p. Hence, the prime ideals of K correspond bijectively with the finite primes of 
K, and it is in this sense that the notion of primes as valuations generalizes that of a prime 
ideal. 


As for the archimedean valuations, when K is a number field every jembedding) of K into R 
or C yields a valuation of K by way of the standard [absolute value|on R or C, and one can 
show that every archimedean valuation of K is equivalent to one arising in this way. Thus 
the infinite primes of K correspond to embeddings of K into R or C, and we call such a 
prime [reall or according to whether the valuations comprising it arise from real or 


complex embeddings 
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1 By "prime ideal” we ” prime of K” or equivalently ”prime ideal of the 
[ring of integers] of K”. We do not mean literally a prime ideal of the ring) K , which would be the|zero ideal] 
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Chapter 150 


11815 — Ramification and extension 
theory 


150.1 decomposition group 


150.1.1 Decomposition Group 


Let A be a noetherian|integrally closed integral domain with [field of fractions! K. Let L be 
a of K and denote by B the of A in L. Then, for any 


p C A, the Galois group) G := ['((|Q)Z/K) acts transitively on the set of all 
prime ideals P C B containing p. If we fix|a particular prime ideal P C B lying over p, 


then the|stabilizer| of P under this is a [subgroup] of G, called the decomposition 
group at P and denoted D(P/p). In other words, 


D(P/p) := {0 € G | o(P) = (P)}. 


If P C B is another prime ideal of B lying over p, then the decomposition groups D(P/p) 
and D(P'/p) are|conjugate]in G via any Galois automorphism|mapping] P to P’. 


150.1.2 Inertia Group 


Write l for the [residue field] B/P and k for the residue field A/p. Assume that the/extension| 

I/k is [separable] (if it is not, then this development is still possible, but considerably more 

complicated; see [i] p. 20]). Any element ø € D(P/p), by definition, fixes P and hence 

descends to a well defined automorphism of the l. Since o also fixes A by virtue of 

being in G, it induces an automorphism of the extension //k fixing k. We therefore have a 
D(P/p) — T((Q)1/k), 
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and the kernell of this homorphism is called the inertia group of P, and written T(P/p). It 
turns out that this homomorphism is actually [surjective] so there is an 


1 —~+ T(P/p) — D(P/p) — TP ((|Q)l/k) — 1 (150.1.1) 


150.1.3 Decomposition of Extensions 


The decomposition group is so named because it can be used to decompose the field extension 
L/K into a of intermediate extensions each of which has very factorization 
behavior at p. If we let L? denote thelfixed field of D(P/p) and LT the fixed field of T(P/p), 
then the exact sequence (150.17) corresponds under [Galois theory] to the lattice of fields 


L 


e 


LT 


LP 
g 


K 


If we write e, f, g for the degrees of these intermediate extensions as in the diagram, then we 
have the following remarkable series of equalities: 


1. The number e equals the {ramification index] e(P/p) of P over p, which is [independent] 
of the choice of prime ideal P lying over p since L/K is Galois. 


2. The number f equals the [inertial degree| f(P/p) of P over p, which is also independent 
of the choice of prime ideal P since L/K is Galois. 


3. The number g is equal to the number of prime ideals P of B that lie over p C A. 
Furthermore, the fields L? and L? have the following independent characterizations: 
e L” is the smallest intermediate field F such that F is totally ramified over PAN F, and 


it is the largest intermediate field such that e(P N F, p) = 1. 


e L? is the smallest intermediate field F such that P is the only [prime] of B lying over 
P(A F, and it is the largest intermediate field such that e(P AN F, p) = f (PQF, p) = 1. 


Informally, this decomposition| of the extension says that the extension L?/K encapsulates 
all of the factorization of p into distinct primes, while the extension L7/L? is the source) 
of all the inertial degree in P over p and the extension L/L” is responsible for all of the 
ramification that occurs over p. 
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150.1.4 Localization 


The decomposition groups and inertia groups of P behave well under That 
is, the decomposition and inertia groups of PBp C Bọ over the prime ideal pA, in the 
localization Ay of A are identical to the ones obtained using A and B themselves. In fact, 


the same holds true even in the [completions] of the local rings] Ap and Be at p and P. 


REFERENCES 


1. J.P. Serre, Local Fields, Springer-Verlag, 1979 (GTM 67) 
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150.2 examples of prime ideal decomposition in num- 


ber fields 


Here we follow the notation of the entry on the decomposition group, See also this entry 


Example 1 
Let K = Q(V—7); then Gal(K/Q) = {Id,o} © Z/2Z, where ø is the complex conjugation] 
map, Let Ox be the ring of integers] of K. In this case: 
1 —7 
of 


The [discriminant] of this [field] is Dgo = —T. We look at the [decomposition] in prime ideals] 
of some prime ideals in Z: 
1. The only prime ideal in Z that \ramifies| is (7): 
(7)0x = (V-77) 


and we have e = 2, f = g = 1. Next we compute the decomposition and 
from the definitions. Notice that both Id, a {fix| the ideal] (\/—7). Thus: 


D((vV—7)/(7)) = Gal(K/Q) 
For the inertia group, notice that o + Id mod (/—7). Hence: 
T((V—7)/(7)) = Gal(K/Q) 


Also note that this is trivial if we use the of the |fixed field of D((./—7) /(7)) 
and T((./—7)/(7)) (see the [section] on “decomposition of extensions” in the entry on 
decomposition group), and the fact that e- f -g = n, where n is the degree) of the 


extension] (n = 2 in our case). 
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2. The/primes| (5), (13) are inert, i.e. they are prime ideals in Ox. Thus e = 1 = g, f = 2. 
(Obviously, the map ø fixes the ideals (5), (13), so 


D(50x/(5)) = Gal(K/Q) = D(130x/(13)) 
On the other hand o(./—7) 4 —V—7 mod(5), (13), soo 4 Id mod(5), (13) and 


T(50x/(5)) = {Id} = T(130x/(13)) 


3. The primes (2), (29) are split: 


290 x = (29, 14+ V—7) (29,14 — V-T) =R- R' 
so e = f =1,¢=2 and 


D(P/(2)) = T(P/(2)) = {Id} = D(R/(29)) = T(R/(29)) 


Example 2 


Let ¢7 = er, i.e. a 7"-root of unity, and let L = Q(¢7). This is a cyclotomic extension) of 
Q with 
Gal(L/Q) S (Z/7Z)* =~ Z/6Z 


Moreover 
Gal(L/Q) = {oa: L > L | oa(Gr) =$, a € (Z/T7Z)*} 


gives us the subfields) of L: L = Q(z) 


SY 
Q(¢7 + C$) 
Q(V/-7) 
CS 


The discriminant of the extension L/Q is Drjg = —7°. Let Oz denote the ring of integers of 
L, thus Oz = Z[¢;|. We use the results of |this entry|to find the decomposition of the primes 
2, 5,7,13,29: 


L = QQ) =g P-P (5) Dı + Q2: Qs 
K= v HVA) 6 (13) 
Q (7) (2) (5) (13) 


1. The prime ideal 7Z is totally ramified in L, and the only prime ideal that ramifies: 
Or) a 


Thus 

e(Z/(7)) =6, F(E/(7)) = g(Z/(7)) = 1 
Note that, by the properties of the fixed fields of decomposition and inertia groups, we 
must have LT(@/™) = Q = LP@/™), thus, by Galois theory, 


D(Z/(7)) = T(E/(7)) = Gal(L/Q 


2. The ideal 2Z factors) in K as above, 20x = P - P', and each of the prime ideals P, P’ 
remains inert from K to L, i.e. PO; = P, a prime ideal of L. Note also that the [order] 
of 2 mod 7 is 3, and since g is at least 2, 2-3 = 6, so e must equal 1 (recall that 
efg=n): 

e(B/(2))=1, f(P/2))=3,  g(B/(2)) =2 
Since e = 1, LT®/) = L, and [be LORE) = 3 so 


D(P/(2)) =< o2 >= Z/3Z, T(P/(2)) = {Id} 
3. The ideal (5) is inert, 50; = G is prime and the order of 5 modulo 7 is 6. Thus: 
e(6/(5))=1,  F(G/(5))=6,  g(G/(5)) =1 


D(G/(5)) = Gal(L/Q),  T(G/(5)) = {1d} 


4. The prime ideal 13Z is inert in K but it splits in L, 130; = Q, - Q.- Qs, and 
13 & 6 & —1 mod 7, so the order of 13 is 2: 


e(Q,/(13))=1, f(Q:/(13))=2, g(Q:/(13)) = 3 
D(Q,/(13)) =< oe >= Z/2Z, T(Q;/(13)) = {1d} 
5. The prime ideal 29Z is splits completely in L, 
290, = Py - Ry Ra Ri Ha N 
Also 29 & 1 mod 7, so f = 1, 
e(R;/(29)) =1, f(R;/(29) = 1, _9(Hi/(29)) = 6 


D(Ri/(29)) = T(Ri/(29)) = {Id} 
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150.3 inertial degree 


Let ı : A — B bearing homomorphism) Let P C B be alprime ideal) with p := 17+ (P) C A. 
The [algebraļmapi : induces an A/p on the B/P. If the [dimension] of 
B/P as an A/p module exists, then it is called the inertial degree of P over A. 


A particular case of special importance in number is when L/K is a/field/extension| 
and ¿ : Og —> Or, is the ‘inclusion map| of the ring of integers, In this case, the 


Ox/p is a field, so dimo,/p Oz/P is guaranteed to exist, and the inertial degree of P over 


Ox is denoted f(P/p). We have the [formula] 


> (P/p) f(P/p) = [L : K], 


P|p 


where e(P/p) is thelramification index of P over p and the sum is taken over all prime ideals 
P of O; dividing pOz. 


Example: 


Let ¿ : Z — Zfi] be thelinclusionl of the [integers] into the Gaussian integers) A|prime}p in Z 
may or may not factor|in Z/#]; if it does factor, then it must factor as p = (x + yi)(a— yi) for 
some integers x,y. Thus a prime p factors into two primes if it equals z? + y?, and remains 
prime in Z[i] otherwise. There are then three categories| of primes in Z[i]: 


1. The prime 2 factors as (1+7)(1 — i), and the principal ideals generated by (1 +7) and 


(1 — i) are equal in Z|i], so the ramification index of (1 + i) over Z is two. The ring 


Z(i]/(1 + i) is{isomorphic| to Z/2, so the inertial degree f((1 + i)/(2)) is one. 


2. For primes p & 1 mod 4, the prime p € Z factors into the product of the two primes 
(x + yi)(a + yi), with ramification index and inertial degree one. 


3. For primes p & 3 (mod 4), the prime p remains prime in Z|i] and Z[i]/(p) is a two 
dimensional field extension of Z/p, so the inertial degree is two and the ramification 
index is one. 


In all cases, the sum of the products of the inertial degree and ramification index is equal to 


2, which is the dimension of the corresponding extension Q(7)/Q of 


150.3.1 Local interpretations & generalizations 


For any extension 1: A — B of the inertial degree of the prime P C B 
over the prime p := u-'(P) C A is equal to the inertial degree of PBp over pAy in the 


at P and p. Moreover, the same is true even if we pass to [completions] of the 
local rings| By and A, at P and p. The preservation of inertial degree and ramification indices 
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with respect to localization is one of the reasons why the technique of localization is a useful 
tool in the study of such domains. 


As in the case of ramification indices, it is possible to define the notion of inertial degree 


in the more general setting of However, the generalizations of inertial 


degree are not as widely used because in one usually works with a fixed] 
[basel field, which makes all the {residue fields at the points equal to the same field. 
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150.4 ramification index 


150.4.1 Ramification in number fields 


Definition 5 (First definition). Let L/K be an [extension] of number fields) Let p be a 
nonzero in the\ring of integers|Ox of K, and suppose thelideal pO; C Oz factors] 


as m 
pO, = | [FF 
i=1 


for some prime ideals P; C Or and [exponents] e; € N. The [natural number) e; is called the 
ramification index of P; over p. It is often denoted e(P;/p). If e; > 1 for any i, then we say 
the ideal p ramifies in L. 


Likewise, if P is a nonzero prime ideal in Oz, and p := P N Ox, then we say P ramifies over 
K if the ramification index e(P/p) of P in the factorization of the ideal pO; C Or is greater 
than 1. That is, a |prime] p in Ox ramifies in L if at least one prime P dividing pO; ramifies 
over K. If L/K is a Galois extension| then the ramification indices of all the primes dividing 
pO; are equal, since the [Galois group is transitive] on this set of primes. 


150.4.2 The local view 


The phenomenon of ramification has an in of 
With L/K as before, let P be a prime in Oz with p := P()Ox. Then the 
map) of (localizations) (Ox)p => (Oz) is a local [homomorphism] of local rings (in fact, of 
discrete valuation rings), and the ramification index of P over p is the unique natural num- 
ber e such that 

p(Oz)p = (P(Oz)p)* C (O1)>. 


An astute reader may notice that this formulation of ramification index does not require 
that L and K be number fields, or even that they play any role at all. We take advantage 
of this fact here to give a second, more general definition. 
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Definition 6 (Second definition). Let 1: A — B be any [ring homomorphism| Suppose 


P C Bisa prime ideal such that the localization Bp of B at P is a discrete valuation ring. 
Let p be the prime ideal .~'(P) C A, so that ¿ induces a local homomorphism up : Ay — Bp. 
Then the ramification index e(P/p) is defined to be the unique natural number such that 


Up) Bp = (PBp) C Bp, 


or œ if ı(p)Bp = (0). 


The reader who is not interested in local rings may assume that A and B are|ļunique factorization domains 


in which case e(P/p) is the exponent of P in the factorization of the ideal :(p)B, just as in our 
first definition (but without the requirement that the [rings] A and B originate from number 
fields). 


There is of course much more that can be said about ramification indices even in this purely 
setting, but we limit ourselves to the following remarks: 


1. Suppose A and B are themselves discrete valuation rings, with respectivelmaximal ideals] 
p and FP. Let A := lim A/p” and B := lim B/P” be the [completions] of A and B with 
respect to p and P. Then 4 

e(P/p) = e(PB/pA). (150.4.1) 


In other words, the ramification index of P over p in the A-algebra B equals the 
ramification index in the completions of A and B with respect to p and FẸ. 


2. Suppose A and B are Dedekind domains\ with respective [fraction fields| K and L. If 
B equals the integral closure| of A in L, then 


X` e(P/p)f(P/p) < [L: K], (150.4.2) 


Pip 


where  {ranges| over all prime ideals in B that divide pB, and f(P/p) := dima; (B/P) 
is the of P over p. Equality holds in Equation (190.4.2) whenever B is 
finitely generated) as an A-module. 


150.4.3 Ramification in algebraic geometry 


The word “ramify” in English means “to divide into two or more branches,” and we will 
show in this|section| that the mathematical term lives up to its common English meaning. 


Definition 7 (Algebraic version). Let f : C1 — C2 bea non-constant regular morphism 
of curves) (by which wemeamone dimensional algebraic varieties) over 
an algebraically closed|field|k. Then f has a nonzero|degree|n := deg f, which can be defined 


in any of the following ways: 
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Figure 150.1: The function f(y) = y? near y = 0. 


e The number of points in a generic) fiber| f-\(p), for p € Co 


e The maximum number of points in f~!(p), for p € Co 


e The degree of the extension k(C,)/f*k(C2) of function fields 


There is a finite) set of points p € C2 for which the {inverse image] f~'(p) does not have 


n, and we call these points the branch points or ramification points of f. If P € Ci with 
f(P) = p, then the ramification index e(P/p) of f at P is the ramification index obtained 
algebraically from Definition [6] by taking 


e A = k[C2]p, the local ring consisting of all{rational functions! in the function field k(C3) 
which are at p. 


B=k|C\]p, the local ring consisting of all rational functions in the function field k(C1) 
which are regular at P. 


èe p=m,, the maximal ideal in A consisting of all functions which vanish at p. 


P = mp, the maximal ideal in B consisting of all functions which vanish at P. 


L = f} : k[C2]p > k[Ci]p, the map on the function fields induced by the morphism f. 


Example 1. The following picture may be worth a thousand words. Let k = C and Ci = 
Cy = C = Ag. Take the map f : C — C given by f(y) = y*. Then f is plainly a map of 
degree 2, and every point in Ch except for 0 has two preimages in C4. The point 0 is thus a 
ramification point of f of 2, and near 0 we have the following graph) of f. 


Note that we have only drawn the [reall locus of f because that is all that can fit into two 
We see from the figure that a typical point on Cə such as the point x = 1 has 
two points in Cı which map to it, but that the point x = 0 has only one corresponding point 
of Cı which “branches” or “ramifies” into two distinct points of Cı whenever one moves 
away from 0. 


150.4.4 Relation to the number field case 


The relationship between Definition [6]and Definition [flis easiest to explain in the case where 
f is a map between |affine varieties! When Cı and Cy are affine, then their coordinate rings 
k|Cı] and k[C2] are Dedekind domains, and the points of the curve C (respectively, C2) 
correspond naturally with the maximal ideals of the ring k[C1] (respectively, k[C'2]). The 
ramification points of the curve Cı are then exactly the points of Cı which correspond 
to maximal ideals of k[Cı] that ramify in the algebraic sense, with respect to the map 
f* : k[{C2]| — k[Ci] of coordinate rings. 
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Equation (150.42) in this case says 


5o e(P/p) =n, 


Pef-'(p) 


and we see that the well known (150.42) in number is simply the alge- 
braic analogue of the geometric fact that the number of points in the fiber of f, counting 
multiplicities, is always n. 


Example 2. Let f : C — C be given by f(y) = y? as in Example [] Since C% is just the 
affine line, the coordinate ring C[C.] is equal to C[X], the [polynomial ring) in one variable 
over C. Likewise, C/C;] = C[Y], and the induced map f* : C[X] — C[Y] is naturally given 
by f*(X) = Y?. We may accordingly identify the coordinate ring C[C2] with the 
C[X?] of C[X] = C[C\]. 


Now, the ring C[X] is a|principal ideal domain} and the maximal ideals in C[X] are exactly 


the of the form (X — a) for any a € C. Hence the nonzero prime ideals in 
C[X?] are of the form (X? — a), and these factor in C[X] as 


(X? — a) = (X — Va)(X + Va) c CIX]. 


Note that the two prime ideals (X — va) and (X + va) of C[X] are equal only when a = 0, 
so we see that the ideal (X? — a) in C[X°], corresponding to the point a € C2, ramifies in C 
exactly when a = 0. We have therefore recovered our previous geometric characterization of 
the ramified points of f, solely in terms of the algebraic factorizations of ideals in C[X]. 


In the case where f is a map between |projective varieties, Definition [6] does not directly 
apply to the coordinate rings of C; and Cy, but only to those of open covers) of C1 and Ch 


by affine varieties. Thus we do have an instance of yet another new phenomenon here, and 
rather than keep the reader in suspense we|jump|straight to the final, most general definition 
of ramification that we will give. 


Definition 8 (Final form). Let f : (X, Ox) — (Y, Oy) bea morphism ofllocally ringed spaces, 
Let p € X and suppose that the (Ox), is a discrete valuation ring. Write ¢, : 
(Oy) pp) — (Ox), for the induced map of f on stalks at p. Then the ramification in- 
dex of p over Y is the unique natural number e, if it exists (or oo if it does not exist), such 
that 

dbp(My(p))(Ox)p = mi, 
where m, and Mj) are the respective maximal ideals of (Ox), and (Oy) sip). We say p is 
ramified in Y ife > 1. 


Example 3. A ring homomorphism . : A —>» B corresponds functorially to a morphism 


Spec(B) — Spec(A) of locally ringed spaces from the {prime spectrum) of B to that of A, 


and the algebraic notion of ramification from Definition [6] equals the sheaf—theoretic notion 
of ramification from Definition [59.4.3] 
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Example 4. For any morphism of varieties f : Cı — > C», there is an induced morphism 
f* on the sheaves of C4 and Cy, which are locally ringed spaces. If Cy; and Co 
are curves, then the stalks are one dimensional and therefore discrete 
valuation rings, so in this way we recover the algebraic geometric definition (Definition [7 


from the definition (Definition 159.43). 


150.4.5 Ramification in complex analysis 


Ramification points or branch points in are merely a special case of the 
high-flown terminology of Definition [[59-4.3] However, they are important enough to merit 


a separate mention here. 


Definition 9 (Analytic version). Let f : M — N be a [holomorphic] map of Riemann 
surfaces. For any p € M, there exists U and V around p and f(p) 
such that f is locally the map z + z° from U to V. The natural number e is called the 
ramification index of f at p, and p is said to be a branch point or ramification point of f if 
e>1. 


Example 5. Take the map f : C — C, f(y) = y? of Example [] We study the behavior 
of f near the unramified point y = 1 and near the ramified point y = 0. Near y = 1, take 
the coordinate w = y — 1 on the domaini and v = x — 1 on the range. Then f maps w + 1 
to (w + 1)”, which in the v coordinate is (w + 1)? — 1 = 2w + w?. If we change coordinates 
to z = 2w + w? on the domain, keeping v on the range, then f(z) = z, so the ramification 
index of f at y = 1 is equal to 1. 


Near y = 0, the function f(y) = y? is already in the form z +> 2° with e = 2, so the 
ramification index of f at y = 0 is equal to 2. 


150.4.6 Algebraic—analytic correspondence 


Of course, the lanalytic| notion of ramification given in Definition D] can be couched in terms 
of locally ringed spaces as well. Any Riemann surface together with its sheaf of holomorphic 
functions is a locally ringed space. Furthermore the stalk at any point is always a discrete 
valuation ring, because germs of holomorphic functions have Taylor expansions making the 
stalk isomorphic) to the power series|ring C[[z]]. We can therefore apply Definition [59.23] to 
any holomorphic map of Riemann surfaces, and it is not surprising that this process yields 
the same results as Definition D] 


More generally, every map of algebraic varieties f : V —> W can be interpreted as a 
holomorphic map of Riemann surfaces in the usual way, and the ramification points on V 
and W under f as algebraic varieties are identical to their ramification points as Riemann 
surfaces. It turns out that the analytic structure may be regarded in a certain sense as 
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the “completion” of the algebraic structure, and in this sense the algebraic—analytic corre- 
spondence between the ramification points may be regarded as the geometric version of the 


equality (50.41) in number theory. 


The algebraic—analytic correspondence of ramification points is itself only one manifestation 
of the wide ranging identification between algebraic geometry and analytic geometry which 
is explained to great effect in the seminal paper of Serre [6]. 
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150.5 unramified action 
Let K bea and let v be aldiscrete valuation|on K (this might be, for example, 
the fvaluation| attached to a [prime ideal $ of K). 
Let K, be the|completion| of K at v, and let O, be the |ring of integers] of K,, i.e. 

0, = {k € K, |u(k) > 0} 
The maximal ideal] of O, will be denoted by 

M = {k € K, | v(k) > 0} 
and we denote by k, the [residue field| of K,, which is 

ky = O,/M 

We will consider three different global [Galois groups| namely 


Gx = Gal(K/K) 
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GREK, = Gal(K,/K,) 
Grze, = Gal(ky/kv) 


where K, K,, k, are algebraic closures| of the corresponding (field! We also define 
notation for the of GK, 


I, C GK, 
Definition 8. Let $ be a set and suppose there is a of Gal(K,/K,) on 8. We 
say that $ is)unramified at v, or the action of GKK, on 8 is unramified at v, if the action 


of I, on § is trivial, i.e. 
o(s)=s Vocel, VsES 


Remark: By [Galois theory) we know that, K3", the [fixed field] of 7,, the inertia [subgroup] 
is the unramified extension] of K,,, so 


1, = Gal(K,/ K) 
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Chapter 151 


11$31 — Class field theory; p-adic 
formal groups 


151.1 Hilbert symbol 


Let K be any local field| For any two nonzero elements a,b € K*, we define: 


(a,5) E if 2? = az? + by? has a nonzero solution (x,y,z) Æ (0,0,0) in K’, 
a,b) := 


—1 otherwise. 
The number (a,b) is called the Hilbert symbol of a and b in K. 
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Chapter 152 


11899 — Miscellaneous 


152.1 p-adic integers 


152.1.1 Basic construction 


For any p, the p-adic is the obtained by taking the of the 
ring Z with respect to the by the [valuation] 


|z| := x EZ, (152.1.1) 


1 
prele) ? 
where v,(x) denotes the largest integer e such that p° idivides!|x. The ring of p-adic integers 
is usually denoted by Zp, and its fraction field) by Qp. 


152.1.2 Profinite viewpoint 


The ring Z, of p-adic integers can also be constructed by taking the inverse limit] 
Zp := lim Z/p"Z 


over the --. —> Z/PZ — Z/pZ — 0 consisting of the rings Z/p"Z, for all 
n > 0, with the projection maps defined to be the unique maps] such that the diagram 


Z 


N 


Z/p"Z 


Z, in Z, 
commutes. An and topological between the two constructions is 


obtained by taking the coordinatewise projection map Z — limZ/p"Z, extended to the 


completion of Z under the p-adic metric. 
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This alternate characterization shows that Z, is|compact| since it is alclosed|subspace’ of the 


space 
[Zz 


n20 


which is an[infinitelproduct of topological spaces and hence compact under the product topology 


152.1.3 Generalizations 


If we interpret the prime p as an [equivalence class] of valuations on Q, then the field) Q, is 


simply the completion of the Q with respect to the metric induced by any 
member valuation of p (indeed, the valuation defined in Equation (52.1.1) may serve as 


the representative). This notion easily generalizes to other fields and valuations; namely, 
if K is any field, and p is any prime of K, then the p-adic field Ky is defined to be the 
completion of K with respect to any valuation in p. The analogue of the p-adic integers in 
this case can be obtained by taking the[subset] (and |subring) of K, consisting of all elements 
of absolute value} less than or equal to 1, which is well defined independent) of the choice of 


valuation representing p. 


In the special case where K is alnumber field| the p-adic ring K p is always alfinite extension! 
of Q, whenever p is a [finite prime, and is always equal to either R or C whenever p is an 
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152.2 local field 


A local fieldis a which is|Hausdorff and as a topological space 


Examples of local fields include: 


e Any [field together with the {discrete topology 
e The field R of 


The field C of [complex numbers 
The field Q, of or any finite extension] thereof. 


e The field F,((t)) of in one variable t with coefficients in the 
F, of q elements. 


In fact, this list is complete—every local field is isomorphic) as a topological field to one of 
the above fields. 
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Chapter 153 


11Y05 — Factorization 


153.1 Pollard’s rho method 


Say, for example, that you have a big number n and you want to know the factors of n. Let’s 
use 16843009. And, say, for example, that we know that n is a not a\prime number) In this 
case, I know it isn’t because I multiplied two prime numbers together to make n. (For the 
crypto weenies out there, you know that there a lot of numbers lying around which were 
made by multiplying two prime numbers together. And, you probably wouldn’t mind finding 
the factors of some of them.) In cases where you don’t know, a priori, that the number is 
there are a variety of methods to test for compositeness. 


Let’s assume that n has a factor d. Since we know n is composite, we know that there must 
be one. We just don’t know what its value happens to be. But, there are some things that 
we do know about d. First of all, d is smaller than n. In fact, there are at least some d which 


are no bigger than the |square root of n. 


So, how does this help? If you start picking numbers at random (keeping your numbers 
greater or equal to zero and strictly less than n), then the only time you will get a © 
b (mod#1) is when a and b are identical. However, since d is smaller than n, there is a good 
chance that a = b (mod#1) sometimes when a F b. 


Well, if a = b (mod#1), that means that (a —b) is a multiple of d. Since n is also a multiple 
of d, the\greatest common divisor|of (a—b) and n is a positive, jinteger| multiple of d. We can 


keep picking numbers randomly until the greatest common divisor of n and the 
of two of our random numbers] is greater than one. Then, we can divide n by whatever this 
greatest common divisor turned out to be. In doing so, we have broken down n into two 
factors. If we suspect that the factors may be composite, we can continue trying to break 
them down further by doing the algorithm again on each half. 


The amazing thing here is that through all of this, we just knew there had to be some divisor 
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of n. We were able to use of that divisor to our advantage before we even knew 
what the divisor was! 


This is at the heart of Pollard’s rho method. Pick a random number a. Pick another random 
number b. See if the greatest common divisor of (a — b) and n is greater than one. If not, 
pick another random number c. Now, check the greatest common divisor of (c — b) and n. 
If that is not greater than one, check the greatest common divisor of (c — a) and n. If that 
doesn’t work, pick another random number d. Check (d—c), (d — b), and (d — a). Continue 
in this way until you find a factor. 


As you can see from the above paragraph, this could get quite cumbersome quite quickly. By 
the k-th [iteration] you will have to do (k — 1) greatest common divisor checks. Fortunately, 
there is way around that. By structuring the way in which you pick “random” numbers, you 
can avoid this buildup. 


Let’s say we have some polynomial] f(x) that we can use to pick “random” numbers. Because 
we’re only concerned with numbers from zero up to (but not including) n, we will take all of 
the values of f(z) modulo n. We start with some xı. We then pick our “random” numbers 


by tp41 = (f(x) (mod#1)). 


Now, say for example we get to some point k where x, = x; (mod#1) with k < j. Then, 
because of the way that modulo |arithmetic| works, f(x;) will be|congruent| to f(x;) modulo 
d. So, once we hit upon x, and x;, then each element in the/sequence|starting with x; will be 
congruent modulo d to the corresponding element in the sequence starting at xj. Thus, once 
the sequence gets to x; it has looped back upon itself to match up with x; (when considering 
them modulo d). 


This looping is what gives the rho method its mame) If you go back through (once you 
determine d) and look at the sequence of random numbers that you used (looking at them 
modulo d), you will see that they start off just going along by themselves for a bit. Then, 
they start to come back upon themselves. They don’t typically [loop the whole way back to 
the first number of your sequence. So, they have a bit of a tail and a loop—just like the 
greek letter rho (p). 


Before we see why that looping helps, we will first speak to why it has to happen. When 
we consider a number modulo d, we are only considering the numbers greater than or equal 
to zero and strictly less than d. This is a very finite|set of numbers. Your random sequence 
cannot possibly go on for more than d numbers without having some number repeat modulo 
d. And, if the function) f (x) is well-chosen, you can probably loop back a great deal sooner. 


The looping helps because it means that we can get away without accumulating the number 
of greatest common divisor steps we need to perform with each new random number. In 
fact, it makes it so that we only need to do one greatest common divisor check for every 
second random number that we pick. 


Now, why is that? Let’s assume that the loop is of length t and starts at the j-th random 
number. Say that we are on the k-th element of our random sequence. Furthermore, say 
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that k is greater than or equal to 7 and t divides k. Because k is greater than j we know it 
is inside the looping [part] of the p. We also know that if ¢ divides k, then t also divides 2k. 
What this means is that x2, and x, will be congruent modulo d because they correspond 
to the same point on the loop. Because they are congruent modulo d, their difference is a 
multiple of d. So, if we check the greatest common divisor of (£k — 2X, /2) with n every time we 
get to an even k, we will find some factor of n without having to do k — 1 greatest common 
divisor calculations every time we come up with a new random number. Instead, we only 
have to do one greatest common divisor calculation for every second random number. 


The only open] question is what to use for a polynomial f(x) to get some random numbers 
which don’t have too many choices modulo d. Since we don’t usually know much about d, 
we really can’t tailor the polynomial too much. A typical choice of polynomial is 


f(x)=2? +a 
where a is some [constant] which isn’t congruent to 0 or —2 modulo n. If you don’t place 


those restrictions on a, then you will end up degenerating into the sequence {1,1,1,1,...} as 
soon as you hit upon some x which is congruent to either 1 or —1 modulo n. 


Let’s use the algorithm now to factor our number 16843009. We will use the sequence xı = 1 
with @r41 = (10242? + 32767 (mod#1)). [ I also tried it with the very basic polynomial 
f(x) = x2 +1, but that one went 80 rounds before stopping so I didn’t include the table 
here.| 


k Tk gcd(n, Tk — Tk/2) 
1 1 

2 33791 1 
3 10832340 

4 12473782 1 
5 4239855 

6 309274 n 
7 11965503 

8 15903688 1 
9 3345998 

10 2476108 n 
11 11948879 

12 9350010 1 
13 4540646 

14 858249 n 
15 14246641 

16 4073290 n 
17 4451768 

18 14770419 257 


Let’s try to factor again with a different random number schema. We will use the sequence 
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xı = 1 with £n} = (20482? + 32767 (mod#1)). 


k Tk gcd (n, Tk — Tk/2) 
1 1 

2 34815 1 

3 9016138 

4 4752700 1 

5 1678844 

6 14535213 257 


Version: 3 Owner: patrickwonders Author(s): patrickwonders 


153.2 quadratic sieve 


Algorithm To factor|a number n using the quadratic sieve, one seeks two numbers x and 
y which are not congruent) modulo n with x not congruent to —y modulo n but have r? S y? 
(mod n). If two such numbers are found, one can then say that (x +y)(x —y) = 0 (mod n). 
Then, x + y and x — y must have non-trivial factors in common with n. 


The quadratic sieve method of factoring depends upon being able to create a set of numbers 
whose factorization can be expressed as a product of pre-chosen [primes] These factorizations 
are recorded as [vectors] of the Once enough vectors are collected to form a set 
which [containsla linear dependence, this linear dependence is exploited to find two squares] 


which are equivalent) modulo n. 


To accomplish this, the quadratic sieve method uses a set of prime numbers called a factor 
Then, it searches for numbers which can be factored entirely within that factor base. 
If there are k prime numbers in the factor base, then each number which can be factored 
within the factor base is stored as a k-dimensional vector where the i-th [component] of the 
vector for y gives the exponent of the i-th prime from the factor base in the factorization of 
y. For example, if the factor base were {2,3,5,7, 11,13}, then the number y = 2? - 3? . 11° 
would be stored as the vector (3, 2,0, 0,5, 0). 


Once k + 1 of these vectors have been collected, there must be a linear dependence among 
them. The k + 1 vectors are taken modulo 2 to form vectors in Z8. The linear dependence 
among them is used to find a [combination] of the vectors which sum up to the 
in ZK. Summing these vectors is equivalent to multiplying the y’s to which they correspond. 


And, the zero vector in Z% signals a perfect) square. 


To factor n, chose a factor base B = {p1, p2,..., pk} such that 2 € B and for each odd 


prime p; in B, n is a quadratic residue] of p;. Now, start picking x; near /n and [calculate] 
Yi = 1? =n. yi > 2? — (mod n). If y; can be completely factored by numbers in B, 
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then it is called B-smooth. If it is not B-smooth, then discard x; and y; and move on to a 
new choice of x;. If it is B-smooth, then store x;, yi, and the vector of its exponents for the 
primes in B. Also, record a copy of the exponent vector with each component taken modulo 
2: 


Once k + 1 vectors have been recorded, there must be a linear dependence among them. 
Using the copies of the exponent vectors that were taken modulo 2, determine which ones 
can be added together to form the zero vector. Multiply together the x; that correspond to 
those chosen vectors—call this x. Also, add together the original vectors that correspond 
to the chosen vectors to form a new vector Y. Every component of this vector will be even. 
Divide each element of y by 2. Form y = Ties Pi’. 


Because each y; = x? (mod n), x? = y? (mod n). If £ & y (mod n), then find some more 
B-smooth numbers and try again. If x is not congruent to y modulo n, then (x + y) and 
(x — y) are factors of n. 


Example Consider the number n = 16843009 The |integer| nearest its square root|is 4104. 


Given the factor base 
B = 128,05 0, 13} 


, the first few B-smooth values of y; = f (xj) = x? — n are: 


2/3/5]|7 
4122 | 147875 0;O0); 3] 1] 2 
4159 | 454272 T/1)O0];14 2 
4187 | 687960 3/3}1]2] 1 
4241 | 1143072 5161012] 0 
4497 | 3380000 |} 5]/0]4]0| 2 
4993 | 8087040 |}9}5}|1)]0] 1 


Using zo = 4241 and x, = 4497, one obtains: 
Yo = 1143072 = 25 . 36 . 5° . 7° . 130 


yı = 3380000 = 25 . 3° . 54 . 7° . 13? 


Which results in: 
x = 4241 - 4497 = 19071777 


y = 25 . 33 . 5? . 7* - 13" = 1965600 
From there: 
gcd(x — y, n) = 257 
gcd(x + y, n) = 65537 
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It may not be completely obvious why we required that n be a quadratic residue of each p; in 
the factor base B. One might intuitively think that we actually want the p; to be quadratic 
residues of n instead. But, that is not the case. 


We are trying to express n as: 
(x+y)(@—-y) =a? -y =n 
where 


k 
y= [2% 
i=1 


Because we end up squaring y, there is no reason that the p; would need to be quadratic 
residues of n. 


So, why do we require that n be a quadratic residue of each p;? We can rewrite the x?—y? = n 


as: 
k 
P- J] =n 
i=1 


If we take that expression modulo p; for any p; for which the corresponding v; is non-zero, 
we are left with: 
r &n (mod p;) 


Thus, in for p; to show up in a useful solution, n must be a quadratic residue of 
pi. We would be wasting time and space to employ other primes in our factoring and 


inear combinations 


Version: 6 Owner: patrickwonders Author(s): patrickwonders 
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Chapter 154 


11Y55 — Calculation of integer 
sequences 


154.1 Kolakoski sequence 


A “self-describing” {kn} of alternating (blocks) of 1’s and 2’s, given by the fol- 
lowing rules: 


e ky = 18 
e k, is the length of the (n + 1)’th block. 


Thus, the sequence begins 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1,1, 2, 1, ... 


It is conjectured that the [density] of 1’s in the sequence is 0.5. It is not known whether the 
1’s have a density; however, it is known that were this true, that density would be 0.5. It is 
also not known whether the sequence is a strongly recurrent sequence; this too would imply 
density 0.5. 


Extensive computer experiments strongly [support] the conjecture. Furthermore, if on is the 
number of 1’s in the first n elements, then it appears that o, = 0.5n + O(logn). Note for 
comparison that for a random sequence of 1’s and 2’s, the number of 1’s in the first n 
elements is with high probability 0.5n + O(./n). 


To generate rapidly a large number of elements of the sequence, it is most efficient to build 
a heirarchy of [generators] for the sequence. If the conjecture is correct, then the depth of 
this heirarchy is only O(logn) to generate the first n elements. 


1 Some Sourcés|start the sequence at ky = 2, instead. This only has the effect of shifting the sequence by 
one position. 
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This is sequence A000002 in the Online Encyclopedia of Integer Sequences 


Version: 1 Owner: ariels Author(s): ariels 
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Chapter 155 


11Z05 — Miscellaneous applications of 
number theory 


155.1 7 function 


The 7 function! takes positive [integers] as its input and gives the number of positive divisors) 
of its input as its output. For example, since 1, 2, and 4 are all of the positive divisors of 4, 
then 7(4) = 3. As another example, since 1, 2, 5, and 10 are all of the positive divisors of 
10, then 7(10) = 4. 


The 7 function behaves according to the following two rules: 

1. If p is a [prime] and x 1. is a nonnegative integer, then T(p*) =£ +1. 

2. If gcd(a, b) = 1, then 7(ab) = T(a)r(b). 

Because these two rules hold for the 7 function, it is a [multiplicative function] 


Note that these rules work for the previous two examples. Since 2 is prime, then 7(4) = 
T(2?) = 2+1= 3. Since 2 and 5 are distinct primes, then 7(10) = 7(2 - 5) = 7(2)7(5) = 
(1+1)(1+1)= 4. 


The 7 function is extremely useful for studying [cyclic rings| 
Version: 7 Owner: Wkbj79 Author(s): Wkbj79 


155.2 arithmetic derivative 
The arithmetic derivative n’ of a natural numberln is defined by the following rules: 
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e p' = 1 for any prime] p. 
e (ab)' = a'b + ab' for any a,b € N (Leibniz rule). 


Note: One of the major contributors to the of the ”arithmetic derivative” is E.J. 
Barbeau who published ” Remark on an arithmetic derivative” in 1961. 


Version: 2 Owner: Johan Author(s): Johan 


155.3 example of arithmetic derivative 


Consider the [natural numberl6. Using the rules of the [arithmetic derivative] we get: 
6 = (2-3) = 2'-342-3'=1-342-1=5 


Below is a list of the 10 first natural numbers and their first and second arithmetic deriva- 


Version: 4 Owner: Johan Author(s): Johan 


155.4 proof that 7(n) is the number of positive divisors 
of n 


Following is a proof that, if r behaves according to the following two rules... 


1. If p is a{prime|and z 1. is a nonnegative [integer] 1. then r(p") = z + 1. 
2. If gcd(a, b) = 1, then T(ab) = 7(a)7(b). 
..then T counts the positive divisors] of its input, which must be a positive integer. 


Let p be a prime. Then p° = 1. Since 1 is the only positive divisor of 1 and 7(1) = r(p®) = 
0+ 1=1, then 7(1) is equal to the number of positive divisors of 1. 


Suppose that, for all positive integers k smaller than z € Z with z > 1, the number of 
positive divisors of k is T(k). Since z > 1, then z has a prime divisor. Let p be a prime that 
divides z. Let x € Z* such that p” divides z and p**! does not divide z. Let a € Z* such 
that z = p*a. Then gcd(a, p) = 1. Thus, gcd(a, p”) = 1. Since a < z, then, by the induction] 
hypothesis, there are T(a) positive divisors of a. 
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Let d be a positive divisor of z. Let y be a nonnegative integer such that p” divides d 
and p¥*+ does not divide d. Thus, 0 < y < x, and there are x +1 choices for y. Let 
c € Z* such that d = p¥c. Then gced(c,p) = 1. Since c divides d and d divides z, then 
c divides z. Since c divides pa and gcd(c,p) = 1, then c divides a. Thus, there are 7(a) 
choices for c. Since there are x + 1 choices for y and there are T(a) choices for c, then there 
are (x + 1)r(a) choices for d. Hence, there are (x + 1)r(a) positive divisors of z. Since 
T(z) = T(p*a) = T(p")7(a) = (x + 1)7(a), it follows that, for every n € Z*, the number of 
positive divisors of n is T(n). 


Version: 2 Owner: Wkbj79 Author(s): Wkbj79 
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Chapter 156 


12-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


156.1 monomial 


A monomial is a product of non-negative of variables. It may also include an 
optional coefficient (which is sometimes ignored when discussing particular of 
monomials). A polynomial can be thought of as a sum over a set of monomials. 


For example, the following are monomials. 


1 x xy 


ryz 3e°y'2? =z 


If there are n variables from which a monomial may be formed, then a monomial may be 
represented without its coefficient as a of n Each position in this vector 
would correspond to a particular variable, and the value of the element at each position 
would correspond to the power of that variable in the monomial. For instance, the monomial 
«yz? formed from the set of variables {w, x, y, z} would be represented as (0 2 1 3)". A 
constant) would be a|zero vector! 


Given this representation, we may define a few more concepts. First, the degree of a 
monomial is the sum of the elements of its vector representation. Thus, the |degree| of xyz 


is0+2+1+3 = 6, and the degree of a constant is 0. If a polynomial is represented as a 
sum over a set of monomials, then the degree of a polynomial can be defined as the degree 
of the monomial of largest degree belonging to that polynomial. 
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156.2 order and degree of polynomial 


Let f be a{polynomial] in two variables, viz. f(x,y) = Dij a,jx'y [| 
Then the degree of f is given by: 
deg f = sup{i + jlai; # 0} 


Note the degree of the zero-polynomial is —oo, since sup (per definition) is —oo, thus 


deg f € NU{0} U{—oo}. 
Similarly the order of f is given by: 
ordf = inf{i+ jla;; 4 0} 
Note the order of the zero-polynomial is co (because inf = oo). Thus ordf € NU {0} U{co}. 


Please note that the term order is not as common as degree. In fact, it is perhaps more 
frequently associated with [power series] (a form of generalized polynomials) than with ordi- 
nary polynomials. Also be aware that the term order occasionally is used as a synonym for 
degree. 


Version: 4 Owner: jgade Author(s): jgade 


'In order to simplify the notation, the definition is given in{terms|of a polynomial in two variables, however 
the definition naturally scales to any number of variables. 
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Chapter 157 


12-X X — Field theory and polynomials 


157.1 homogeneous polynomial 


Alpolynomial) P(x1,--+ , £n) of |degree|k is called|homogeneous|if P(cx1, eae | = CEP (£1, , En) 
for all c. 

An [equivalent] definition is that all {terms| of the polynomial have the same degree (i.e. k). 
Observe that a polynomial P is homogeneous liff deg P = O (P). 


As an important example of homogeneous polynomials one can mention the symmetric polynomials 


Version: 7 Owner: jgade Author(s): jgade 


157.2 subfield 
Let F be a field) and S alsubset|such that S with the inherited operations from F is also a 
field. Then we say that S' is a subfield of F. 


Version: 2 Owner: drini Author(s): drini, apmxi 
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Chapter 158 


12D05 — Polynomials: factorization 


158.1 factor theorem 


If f(x) is a [polynomial] then x — a is a|factorlif and only if a is a{root] (that is, f(a) = 0). 


This theorem is of great help for finding factorizations of higher polynomials. As 
example, let us think that we need to factor the polynomial p(x) = x? + 3x? — 33x — 35. 
With some help of the {rational root theorem! we can find that x = —1 is a root (that is, 


p(—1) = 0), so we know (a + 1) must be a factor of the polynomial. We can write then 
p(x) = (a + 1)q(2) 


where the polynomial q(x) can be found using long or synthetic division of p(x) between 
xz —1. Some calculations show us that for the example q(x) = z? + 2x — 35 which can be 
easily factored as (x — 5)(x + 7). We conclude that 


p(x) = (x + 1)(@ — 5)(x +7). 
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158.2 proof of factor theorem 


Suppose that f(z) is a|polynomial] of degree|n — 1. It is infinitely |differentiable| and therefore 


has a|Taylor series expansion about a. Since f)(x) = 0, the expansion terminates after the 
n—-1 Also, the nth remainder of the Taylor series vanishes. 


Thus the {function is equal to it’s Taylor series. 


f"(a) 
ay 7 


—— (r-a) + 


Now if z — a is a/factor| of f(x) then we can write f(x) = (x — a)g(x) for some polynomial 
g(x). If f(a) = 0 we have 


pul 
(x — a) J ®) r att, 


k=1 


and therefore f(x) = (x — a)g(x). So x — a is a factor of f(x). Now if f(x) = (x — a)g(x), 
that is if z — a is a factor of f(x), we immediately havef (a) = (a — a)g(a) = 0. Thus z — a 
is a factor of f(x) if and only if f(a) = 
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158.3 proof of rational root theorem 


Let p/q be alrootlof p(x). Then we have 


n n—1 
An (=) + an1 (=) +...+a, e) + ao =Q. 
q q q 


Now multiply through by q”, and do some [simple] rearrangements to obtain: 


anp” + anıp tq Para T apd” * + aoq” = 0 
agg” = app” =o," “Ga — apq”! 
m= n—1 n—2 n-1 
aod = P(—Gnp — An—1P Q= ere Ag ). 


So p | aoq” and by hypothesis gcd(p,q) = 1. This implies that p | ao. After similar 
rearrangements, we obtain: 


anp” = q(—an-1p"! —... — apq? — ag”): 
So q | anp” and q | an. 


Version: 2 Owner: bs Author(s): bs 


718 


158.4 rational root theorem 


Consider the polynomial 


p(x) = Gna” + Gaz” +--+ ax + ao 
where all the coefficients a; are integers} 
If p(x) has alrationallrootl p/q where gcd(p,q) = 1, then plap and qlan. 


This theorem is a special case of a result about polynomial whose coefficients belong to a 


The theorem then |states| that any root in the {fraction field] is 
also in the base domain| 


Version: 3 Owner: drini Author(s): drini 


158.5 sextic equation 


The sextic Equation is the univariate polynomial of the sixth degree; 
xÊ + arz? + brt + cz? + dr? +er + f =0. 


Joubert showed in 1861 that this polynomial can be reduced without any form of accessory 
irrationalities to the 3-parameterized [resolvent 


xË + axt + br? +cr+c=0. 


This polynomial was studied in great detail by Felix Klein and Robert Fricke in the 19th 
century and it is diectly related to the|algebraic|aspect of Hilbert’s 13th Problem. Its solution 
has been reduced (by Klein) to the solution of the so-called Valentiner Form problem, a 
ternary form problem which seeks the ratios of the variables involved in the invariant| system 
of the Valentiner [group] of 360. It can also be solved with a of generalized 
hypergeometric by Birkeland’s approach to algebraic equations. Scott Crass has 
given an explicit solution to the Valentiner problem by purely iterational methods, see 


Version: 9 Owner: mathcam Author(s): ottem, mathcam 
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Chapter 159 


12D10 — Polynomials: location of 
zeros (algebraic theorems) 


159.1 Cardano’s derivation of the cubic formula 


To solve the cubic |polynomial] equation x? + ax? + br + c = 0 for x, the first step is to apply 
the (fchirnhaus transformation x = y — 3. This reduces the equation to y? + py +q = 0, 
where 


a 

=. a 

P 3 
B ab, 2a 
1 = 0—3 to 


The next step is to substitute y = u — v, to obtain 
(u— v)? +plu—-v)+q=0 (159.1.1) 
or, with the collected, 
(q — (v? — už)) + (u — v) (p — 3uv) = 0 (159.1.2) 


From equation (159.12), we see that if u and v are chosen so that q = v? — u? and p = 3wv, 
then y = u — v will equation ([59.L1þ, and the will be solved! 


There remains the matter of solving q = v? — u3 and p = 3uv for u and v. From the second 


equation, we get v = p/ (3u), and substituting this v into the first equation yields 
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which is a quadratic equation in u. Solving for u’ using the [quadratic formulal we get 
3 _ —alg-+ y 108p? + 729g? 


54 
3 27q + 4/108p3 + 729q? 
54 


UV — 


Using these values for u and v, you can back-substitute y = u — v, p = b — a?/3, q = 
c — ab/3 + 2a?/27, and x = y — a/3 to get the expression for the first root) r,; in the cubic 
formula. The second and third roots rg and r3 are obtained by performing synthetic division 
using rı, and using the quadratic formula on the remaining quadratic [factor] 
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159.2 Ferrari-Cardano derivation of the quartic for- 
mula 


Given alquartic equation) zt + ax? + br? + cx +d = 0, apply the|Tchirnhaus transformation 


rr y — $ to obtain 


yt + py? +qytr=0 (159.2.1) 
where 
B 3a? 
T 8 
B ab T a? 
oS ae 
ac ab 3af 
r = d- — + — - 
4 16 256 


a solution to Equation (159.2.1) solves the original, so we replace the original equation 


with Equation (59.2.1). Move qy +r to the other side and [complete] the [square] on the left 
to get: 


(y? +p? = py? — qy + (p? — r). 


We now wish to add the quantity (y? +p + 2)? — (y? + p}? to both sides, for some unspecified 
value of z whose purpose will be made clear in what follows. Note that (y?+p+ 2)? —(y?+p)? 
is a quadratic in y. Carrying out this addition, we get 


(Y? +ptz)? = (p+ 2z)y? — qy + (22 + 2pz + p? — r) (159.2.2) 


The goal is now to choose a value for z which makes the right hand side of Equation (159.2.2) 
a{perfect|square. The right hand side is a quadratic polynomial in y whose is 


—82° — 20pz? + (8r — 16p°)z +g? + 4pr — 4p’. 
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Our goal will be achieved if we can find a value for z which makes this discriminant zero. 
But the above polynomial is a cubic polynomial in z, so its can be found using the 
Choosing then such a value for z, we may rewrite Equation (59.2.2) as 


(yY +p +z} = (sy +t) 


for some (complicated!) values s and ¢, and then taking the [square root|of both sides and 
solving the resulting quadratic equation in y provides a root of Equation (59.2.1). 


Version: 5 Owner: djao Author(s): djao 


159.3 Galois-theoretic derivation of the cubic formula 


We are trying to find the [rootsi 1, r2,r3 of the polynomial] x? + az? +bx+c=0. From the 
equation 


(x — rı)(£ —re)(z — r3) = £? +az? +br +c 


we see that 
a = — (rı + r2 + r3) 
b = rre+rir3 + T2r3 
C = —T1Pref3 


The goal is to explicitly construct a [radical tower) over the k = C(a, b,c) that 


the three roots r1, T2,T3. 


Let L = C(ri,r2,r3). By [Galois theory|we know that P'((|Q)L/C(a, 6,c)) = S3. Let K C L 
be the {fixed field] of A3 C S3. We have a tower of field [extensions] 


L= Cir, T2, r3) 


A3 
K=? 
S3/A3 
k = Cla, b, c) 


which we know from Galois theory is We use Galois theory to find K and exhibit 
radical [generators] for these extensions. 


Let ø := (123) be a generator of I'((|Q)L/K) = Az. Let w = e"/3 € C C L be a primitive 
[cube root] of unity. Since w has morm 1, Hilbert’s Theorem 90) tells us that w = y/o(y) for 
some y € L. Galois theory (or Kummer |theory) then tells us that L = K(y) and y? € K, 
thus exhibiting L as a radical extension of K. 
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The proof of Hilbert’s Theorem 90 provides a procedure for finding y, which is as follows: 
choose any x € L, form the quantity 


wr +w olr) + w%o*(x); 


then this quantity automatically yields a suitable value for y provided that it is nonzero. In 
particular, choosing x = rə yields 


Y = rı + wrz + wrs. 


and we have L = K(y) with y? € K. Moreover, since T := (23) does not [fix] y’, it follows 
that y? ¢ k, and this, combined with [K : k] = 2, shows that K = k(y°). 


Set z := T(y) = rı +w*re+wr3. Applying the same technique to the extension K/k, we find 
that K = k(y? — 2°) with (y? — 23)? € k, and this exhibits K as a radical extension of k. 


To get explicit formulas) start with y?+z? and y*z°, which are fixed by S; and thus guaranteed 
to be in k. Using the reduction algorithm for symmetric polynomials, we find 


yY? +z = —2a° + 9ab — 27c 
y’ ze (—a? — 3b)? 
Solving this system for y and z yields 
1/3 
(= + 9ab — 27c + \/(2a3 — 9ab + 270)? + 4a? + =) 
yY = a — 


z = 


1/3 
(= + 9ab — 27c — \/(2a? — 9ab + 27c)? + 4(a? + r) Í 
2 


Now we solve the linear system 


= —(r1 +r2+r3) 


Ty FWT + wrs 


x e€ Q 
I 


= rn+u*r,+wrs 


and we get 


1 

Ti = z(a +ty+z) 
1 

r2 = z(a +w y + w2) 
1 

r3 = z(a + wy + w’z) 


which expresses r1,r2,r3 as radical expressions of a,b,c by way of the previously obtained 


expressions for y and z, and completes the derivation of the cubic formula 
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159.4 Galois-theoretic derivation of the quartic formula 


Let xf + ax? + br? + cx +d be a general polynomial) with four [roots] r1, 72,173,174, so (£ — 
rı)(£—ra)(x—r3)(£—r4) = «4+ ax? + bx? +cx+d. The goal is to exhibit the field 


C(ri, r2, 73,74) /C(a, b, c,d) as a radical extension, thereby expressing r1, r2, r3, 74 in [terms] of 
a,b, c,d by radicals 


Write N for C(r1,r2,r3,r4) and F for C(a,b,c,d). The Galois group T'((|Q)N/F) is the 
S4, the permutation group) on the four elements {r1, r2, r3, r4}, which has 


a composition series 
1 < Z/2 < Vi < A4 < S4, 


where: 


e A, is the [alternating groupļlin S4, consisting of the even [permutations] 
e Vi = {1, (12)(34), (13)(24), (14)(23)} is the 
e Z/2 is the two-element {1, (12)(34)} of Va. 


Under the each of these subgroups corresponds to an intermediate 
field of the extension N/F. We denote these [fixed fields] by (in increasing lorder) K, L, and 


M. 
We thus have a tower of field extensions, and corresponding automorphism groups 


Subgroup Fixed field 


Í N 
Z/2 M 
V L 
A K 
Si F 


By Galois theory, or Kummer [theory] each field in this diagram is a radical extension of the 
one below it, and our job is done if we explicitly find what the radical extension is in each 
case. 
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We start with K/F. The |index) of Ay in S4 is two, so K/F is a (degree two extension. We 
have to find an element of K that is not in F. The easiest such element to take is the element 
A obtained by taking the products of the differences] of the roots, namely, 


A = Il (ri—r;) = (rı — r2) (rı — r3) (rı — ra) (r2 — r3)(r2 — Ta) (r3 — ra). 


Observe that A is fixed] by any even permutation of the roots r;, but that o(A) = —A for 
any odd permutation ø. Accordingly, A? is actually fixed by all of $4, so: 


e AEK, but A¢F. 
e A EF. 
e K = F[A] = F[v A?], thus exhibiting K/F as a radical extension. 


The element A? € F is called the (discriminant of the polynomial. An explicit [formulal for 
A? can be found using the {reduction algorithm for symmetric polynomials) and, although it 
is not needed for our purposes, we list it here for reference: 
A? = 256d? — d?(27a* — 144a7b + 128b° + 192ac) — 
(272 — 18abe + 4a%c + 4b? — a7?) — 
2d(abc(9a? — 40b) — 2b? (a? — 4b) — 3c?(a? — 24b)). 


Next up is the extension L/K, which has degree 3 since [Aq : V4] = 3. We have to find an 
element of N which is fixed by V; but not by A4. Luckily, the form of V4 almost cries out 
that the following elements be used: 


ti = (ry -+ r2)(r3 -H r4) 
to := (rı +r3)(r2+ ra) 
t3 := (r1 +ra)(r2+ r3) 


These three elements of N are fixed by everything in V4, but not by everything in Ay. They 
are therefore elements of L that are not in K. Moreover, every permutation in S4 permutes 
the set {t1, te, t3}, so the cubic polynomial 


P(x) := (x — t1) (a — te) (x — ts) 


actually has coefficients in F! In fancier the cubic polynomial ®(x) defines a 
cubic extension E of F which is linearly (disjoint from K, with the EK 


equal to L. The polynomial ®(zx) is called the resolvent cubic of the quartic polynomial 
xt + az? + br? +cx+d. The coefficients of (x) can be found fairly easily using (again) the 
reduction algorithm for symmetric polynomials, which yields 


P(x) = z’ — 2br° + (b + ac — 4d)x + (P + ad — abc). (159.4.1) 
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Using the [cubic formula, one can find radical expressions for the three roots of this polyno- 
mial, which are tı, t2, and t3, and henceforth we assume radical expressions for these three 
quantities are known. We also have L = K[t,], which in light of what we just said, exhibits 
L/K as an explicit radical extension. 


The remaining extensions are easier and the reader who has followed to this point should 
have no trouble with the rest. For the degree two extension M/L, we require an element of 
M that is not in L; one convenient such element is rı + r2, which is a root of the quadratic 
polynomial 

(x — (rı + r2))(x — (r3 + r4)) = 2? + az + tı € Lz] (159.4.2) 


and therefore equals (—a + v'a? — 4t,)/2. Hence M = L[r, +r] = L|(—a + va? — 4t,)/2] is 
a radical extension of L. 


Finally, for the extension N/M, an element of N that is not in M is of course rı, which is a 
root of the quadratic polynomial 


(t — 1r1)(a =r] = 2” — (ry + rae + rire. (159.4.3) 


Now, rı + r2 is known from the previous paragraph, so it remains to find an expression for 
Trg. Note that rir is fixed by (12)(34), so it is in M but not in L. To find it, use the 
equation (t3 + t3 — t,)/2 = ryre + r3r4, which gives 


to + t3 —Tt 
(x — rir2)(a — r3ra) ee aac Uyi 


and, upon solving for rır with the [quadratic formula} yields 


(t2 + t3 — t1) + y (t2 + t3 — t1)? — 16d 


m ve (159.4.4) 
t ts =t) = t tz — t1)? — 16d 
a ee ee (159.4.5) 


We can then use this expression, combined with Equation (59.4.3), to solve for rı using 
the quadratic formula. Perhaps, at this point, our poor reader needs a summary of the 
procedure, so we give one here: 


1. Find tı, t2, and ts by solving the resolvent cubic (Equation (159.4.T)) using the cubic 
formula, 


2. From Equation (159.42), obtain 


2 
(—a — vy a? — 4t,) 
T3 Trg = 5) 
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3. Using Equation (159-43), write 


2 

ry = Matra) — vei tt)? = (rita) 
2 

ma (r3 + ra) + y (r3 + ra)? — 4(rara) 
2 

az (r3 +r4)— wet r4)? — A(r3ra) 


where the expressions rı + rg and r3 + r4 are derived in the previous step, and the 


expressions 12 and r3r4 come from Equation (159.44) and (159.45). 


4. Now the roots r1,72,73,74 of the quartic polynomial x* + ax? + br? + cx + d have been 
found, and we are done! 
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159.5 cubic formula 


The three roots) 71, 72,73 of a cubic polynomial] equation z? + ax? + br + c = 0 are given by 
—a V2 (—a? + 3b) 
r = —-— 


3 
3( 2a3 + 9ab+4+ 4/4 (—a? + 3b)? + (-2a3 + 9ab o$- 27) 


1 
3 


2a? +9ab+ 4/4 (—a? + 3b)? + (—2a3 + 9ab— 270)? — 27c 
54 


—a (1+iv3) (~a? + 3b) 


E 
3 c 
(108 (-20° +9064 V4 + 30) + C208 + 9ab— 279 — 270) 
(1 —i V3) a eee) 


cole 


2 54 


r3 = —+ (1-1 V3) (a? + 36) 


3 az, aes 
(108 (20° +9064 VAa + 30) + (2a + 9ab— 27e)? — 27e) 
(1 +i V3) ——————— ; 


2 54 
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159.6 derivation of quadratic formula 


Suppose A, B,C are with A Æ 0, and suppose 


Ar? + Br+C=0. 


Since A is nonzero, we can [dividelby A and obtain the equation 


x’ +br+c= 0, 
where b = Z and c = re This equation can be written as 
Pine et 
4 4 ? 
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so |completing the square, i.e., applying the (p +q} = p? + 2pq + q’, yields 


b\? &? 
(+5) mar nS 


Then, taking the|square root|of both sides, and solving for x, we obtain the solution formula] 


b b? 
T = “a a~ 
B B2 C 


24° V4 A 
-B + VB? —4AC 
2A 
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159.7 quadratic formula 


The roots) of the quadratic equation 


ax’ + bx +c=0 a,b,c E Ria #0 
are given by the following |formula} 


—b + Vb? — 4ac 
= —____—_—_—__., 
2a 


The number A = b? — 4ac is called the discriminant) if the equation. If A > 0, there are 
two different [reall roots, if A = 0 there is a single real root (counted twice) and if A < 0 


there are no real roots (but two different complex roots). 


Let’s work a few examples. 


First, consider 27? — 14x + 24 = 0. Here a = 2,b = —14, c = 24. Substituting in the formula 
gives us 
14+ ,/(-14)2?—2-4-24 144/74 1442 
C=O _.C SEE oar > SO sos 


2-2 4 4 
So we have two solutions (depending if you take the sign + or —): £ = s = 4 and z = 2 = 3: 
Now we will solve z? — x — 1 = 0. Here a = 1,0 = —1,c = —1 so 
—1+/(-1?-40)-1) 1v5 
E 2 2 
so the solutions are z = 45 and x = 1“. 


2 
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159.8 quartic formula 


The four [roots] 71, 72,73, 74 Of a quartic |polynomial] equation x4 + ax? + br? + cx +d = 0 are 
given by 


ry 


T2 


T3 


T4 


—a 1f 2 23 (b2 — 3ac + 12d) 


4 2]4 3 
\ 3 (æ — 9abc + 27c? + 27a2d — 72bd + 4/ —4(b2 — 3ac + 12d)? + (2b3 — abc - 


—a 1l1ļa 2b 23 (b2 — 3ac + 12d) 


4 2/4 3 , 
\ 3( 208 Qabe + 27c? + 27a2d — 72bd + \/—4(b? — 3ac + 12d)” + (2b? — Yabe - 


—a 1 |a? 2% 23 (b2 — 3ac + 12d) 


27a2d — 72bd + 4/ —4(b? — 3ac + 12d)? + (263 — 9abc 4 


4 2 4 3 
\ 3(20 9abe + 27c? 


—~a 1 |a2 2 23 (b2 — 3ac + 12d) 
— +- f—- + 


4 '2/4 8 
\ 3 (æ — 9abe + 27c2 + 27a2d — 72bd + 4/ —4(b? — 3ac + 12d)? + (28 — 9abc 4 
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159.9 reciprocal polynomial 


Definition [i] Let p : C > C be a polynomial of degree n with |complex| (or real) coefficients. 


Then p is a reciprocal polynomial if 


plz) = £2"p(1/z) 


for all z € C. 


It is clear that if z is a zero for a reciprocal polynomial, then 1/z is also a zero. This 
motivates the name. 


Examples of [matrices] whose characteristic polynomial] are reciprocal are 
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1. orthogonal matrices, 
2. involution matrices, 


3. the [Pascal matrices] [2]. 


REFERENCES 


1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 
2. N.J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, 2002. 
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159.10 root 


Suppose you're given an (function) f(x) where x is an [independent] variable. Then, a root of 
f is a number a that is solution for the equation f(x) = 0, (that is, substituting a for x into 
f gives 0 as result). 


Example. If f(x) = x? — 4, then x = 2 is a root, since f(2) = 2? — 4 = 0. 
Graphically, a root of f is a value where the [graph] of the function intersects the x-axis. 


Of course the definition can be generalized to other kind of functions. The domain needs not 
to be R nor the codomain| As long as the codomain has some kind of 0 element, a root will 
be an element of the domain belonging to the [preimage] of the 0. The function f: R > R 
given by f(x) = x? +1 has no roots, but the function f: C — C given by f(x) = x7 +1 has 
i as a root. 


In the special case of polynomials\ there are general formulas! for finding roots of polynomials 
with degree] up to 4: the quadratic formula) the cubic formulal and the quartic formula 


If we have a root a for a polynomial f(x), weldividel f (x) by x —a (either by polynomial long 
division or synthetic division) and we are left with a polynomial with smaller degrees whose 
roots are the other roots of f. We can use that result together with the|rational root theorem] 
to find alrational root if exists, and then get a polynomial with smaller degree which possibly 
we can find easily the other roots. 


Considering the general case of functions y = f(x) (not necessarily polynomials) there are 
several numerical methods (like Newton’s method) to approximate roots. This could be 
handy too for polynomials whose roots are not 
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159.11 variant of Cardano’s derivation 


By a linear change of variable, a cubic polynomial] over C can be given the form x? + 3br +c. 
To find the zeros of this cubic in the form of surds in b and c, make the substitution x = 
y/3 + 21/3 thus replacing one unknown with two, and then write down identities which are 
suggested by the resulting equation in two unknowns. Specifically, we get 


y + 3(yY3 + 2M 8)y S28 4 24 3(y3 4 2/8) + =0. (159.11.1) 


This will be true if 


ytzt+c=0 (159.11.2) 
3y "3z!" + 3b = 0, (159.11.3) 

which in turn requires 
yz = —0. (159.11.4) 


The pair of equations (1) and (3) is a quadratic system in y and z, readily solved. But notice 
that (2) puts a restriction on a certain choice of 
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Chapter 160 


12D99 — Miscellaneous 


160.1 Archimedean property 


Let x be any [real number! Then there exists a n such that n > x. 


This theorem is known as the Archimedean property of real numbers. It is also some- 
times called the axiom of Archimedes, although this mame is doubly deceptive: it is neither 


an (it is rather a consequence of the least upper bound property) nor attributed to 


Archimedes (in fact, Archimedes credits it to Eudoxus). 


L et x bea real number, and let S = {a € N : a < z}. If S is empty, let n = 1; note that 
x <n (otherwise 1 € S). 


Assume S$ is nonempty. Since S has an|upper bound, S must have a|least upper bound; call 


it b. Now consider b — 1. Since b is the least upper bound, b — 1 cannot be an upper bound 
of S; therefore, there exists some y € S such that y > b— 1. Let n = y + 1; then n > b. But 
y is a [natural] so n must also be a natural. Since n > b, we know n ¢ S; since n ¢ S, we 
know n > x. Thus we have a natural greater than z. 


Corollary 4. If x and y are real numbers with x > 0, there exists a natural n such that 
nx >y. 


S ince z and y are reals, and x Æ 0, y/x is a real. By the Archimedean property, we can 
choose an n € N such that n > y/x. Then ng > y. 


Corollary 5. If w is a real number greater than 0, there exists a natural n such that 0 < 
l/n < w. 


U sing Corollary 1, choose n € N satisfying nw > 1. Then 0 < 1/n < w. 


Corollary 6. If x and y are real numbers with x < y, there exists alrational number a such 
thatn<a<y. 
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F irst examine the case where 0 < x. Using Corollary 2, find a natural n satisfying 
0 < 1/n < (y—2). Let S = {m € N : m/n 2 y}. By Corollary 1 S is non-empty, so 
let mo be the [least elementlof S and let a = (mo — 1)/n. Then a < y. Furthermore, since 
y <mo/n, we have y — 1/n < a; and g < y—1/n < a. Thus alsatisfies| 2 < a < y. 


Now examine the case where x < 0 < y. Take a = 0. 


Finally consider the case where « < y < 0. Using the first case, let b be alrationall satisfying 
—y < b< —x. Then let a = —b. 
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160.2 complex 


There are some [polynomial] equations that don’t have solutions. Examples of these 
are £? +5 = 0, x£? +x +1 {= 0. Mathematically we express this by saying that R is not an 


algebraically closed) field! 


Inlorder|to solve that kind of equation, we have to ”extend” our number system. by adding a 
number 7 that has the|property| that i? = —1. In this way we extend the field of real numbers 
R to a field C whose elements are called complex numbers. The formal construction can 
be seen at [complex numbers}. The field C is algebraically closed: every polynomial with 
complex coefficients, and therefore every polynomial with real coefficients, has at least one 


(which might be real as well). 


Any complex number can be written as z = x + iy (with x,y € R). Here we call x the real 


lpart|of z and y the [imaginary] part of z. We write this as 
x = Re(z) y = Im(z2) 


Real numbers are a [subset] of complex numbers, and a real number r can be written also as 
r +20. Thus, a complex number is real if and only if its imaginary part is equal to zero. 


By writing x + iy as (x,y) we can also look at complex numbers as ordered pairs| With this 
notation, real numbers are the pairs of the form (r, 0). 


The rules of addition and multiplication for complex numbers are: 
(a+ ib) + (x + ty) = (a + x) + i(b + y) (a,b) + (z, y) = (a + x,b + y) 
(a + ib)(x + ty) = G — by) + i(ay + ba) (a, b)(x, y) = (ax — by, ay + br) 


<- see why the last identity holds, expand the first product and then simplify by using 
i? = —1). 


We have also negatives: —(a, b) = (—a, —b) and multiplicative|linverses: 


=b 
ed an rk 
(a, 0) (asp ae) 
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Seeing complex numbers as ordered pairs also let us give C the of 
(over R). The mormi of z = x + iy is defined as 


|z| = a2 +9. 


Then we have |z|? = zZ where Z is the of z = z+ iy and it’s defined as Z = x — iy. 
Thus we can also characterize real numbers as those complex numbers z such that z = Z. 


obeys the following rules: 


z +z = Zi +22 
mw = 22 
= =. z 


The ordered-pair notation lets us visualize complex numbers as points in the plane, but then 


we can also describe complex numbers with polar [coordinates] 


if z = a + ib is represented in polar coordinates as (r,t) we call r the modulus of z and t 
its argument. 


If r = a+ ib = (r,t), then a = rsint and b = rcost. So we have the following expression, 
called the polar form of z: 
z=a+ib=r(cost+isint) 


Multiplication of complex numbers can be done in a very neat way using polar coordinates: 
(r1, t1)(T2, t2) = (Tira, tı + t2). 
The latter expression proves de Moivre’s theorem) 
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160.3 complex conjugate 


160.3.1 Definition 


Scalar Complex Conjugate 


Let z be a complex number] with [real] part a and jimaginary) part b, 


735 


z=at+bi 


Then the complex conjugate of z is 


Z=a-—bi 


Complex conjugation a reflection about the real axis on the 


representing a complex number. 


Sometimes a star (*) is used instead of an overline, e.g. in physics you might see 


int™ WU" Udz = 1 


where W* is the complex conjugate of a wave function. 


160.3.2 Matrix Complex Conjugate 
Let A = (aj) be a n x m matrix] with [complex] entries. Then the complex conjugate of 


A is the matrix A = (a). In particular, if v = (v',...,v") is a complex |row/ vector; then 
T= (vhi 0): 


Hence, the matrix complex conjugate is what we would expect: the same matrix with all of 


its conjugated. 


160.3.3 Properties of the Complex Conjugate 
Scalar Properties 


If u,v are complex numbers, then 


5. Ifv £0, then (4) = u/v 


Vv 
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6. Let u = a + bi. Then tu = uu = a? + b? > 0 (the [complex modulus). 


7. If z is written in polar form as z = re’®, then Z = re~’?. 


160.3.4 Matrix and Vector Properties 


Let A be a matrix with complex entries, and let v be a complex row/column vector. 


Then 
1. AT = (A)* 
2. Av = At, and vA = VA. (Here we assume that A and v are compatible |sizel ) 


Now assume further that A is a complex [square matrix} then 


1. trace A = (trace A) 
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160.4 complex number 


The of complex numbers C is defined to be the of the 
R[X] in one variable over the by the [principal ideal (X? + 1). For a,b € R, the 
of a+ 0X in C is usually denoted a + bi, and one has i? = —1. 


The complex numbers form an algebraically closed|{field, There is a standard metric] on the 


complex numbers, defined by 


d(ay + bii, ag + bzi) = (az = a)? + (bo = b1). 
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160.5 examples of totally real fields 


Here we present examples of totally real fields, totally imaginary fields and CM-fields 


Examples: 


1. Let K = Q(Vd) with d a|square-free] positive [integer] Then 
Xg = {Idx,o} 
where Idg: K > C is the {identity map| (Idx (k) = k, for all k € K), whereas 
a: KOC, o(a+bVd) =a—bVd 
Since Vd € R it follows that K is a [totally real field] 
2. Similarly, let K = Q(Vd) with d a square-free negative integer. Then 
Xx = {Idx,o} 
where Idg: K —> C is the identity map (Idx(k) = k, for all k € K), whereas 
o: KOC, o(at+bVd) =a—bvd 
Since Vd € C and it is not in R, it follows that K is a totally imaginary field. 


3. Let n,n > 3, be a primitive n root of unityland let L = Q(¢,,), aleyclotomic extension 
Note that the only roots of unity that are reall are +1. If Y: L © C is an embedding 


then Y(n) must be a\conjugate) of ¢,, i.e. one of 
{Cn | a € (Z/nZ)* } 
but those are all limaginary, Thus (L) É R. Hence L is a totally imaginary field. 
4. In fact, L as in (3) is a CM-field. Indeed, the maximal real subfield) of L is 

F = Qn + G) 

Notice that the minimal polynomial] of Cn over F is 
nel A F1 

so we obtain L from F by adjoining the|square root of the/discriminant|of this|polynomial| 


which is 


2465? -2 = 2eos() -2 <0 
and any other conjugate is 
+G” -2 2 cos( =) — 2 < 0,a € (Z/nZ)* 
Hence, L is a CM-field. 
5. Notice that any quadratic imaginary number field) is obviously] a CM-field. 
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160.6 fundamental theorem of algebra 


Let f : C — C be a non-constant polynomial, Then there is z € C with f(z) = 0. 
In other words, C is algebraically closed} 
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160.7 imaginary 


A [complex numberļc € C is called imaginary if its|real] part is 0. 
All complex numbers may be written as c = a+bi where i is the imaginary uniti = /—1 and 


a,b € R. An imaginary number can be written as c = bi, and because of this is sometimes 
called a pure complex number. 


The imaginary numbers are under addition but not under multiplication. 
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160.8 imaginary unit 


The imaginary unit i := y—1. Any number m may be written as m = bi, 
b € R. Any [complex number|c € C may be written as c= a + bi, a,b ER. 


Note that there are two complex [square roots| of —1 (i.e. the two solutions to the equation 
xz? +1 = 0 in C), so there is always some ambiguity in which of these we choose to call 


?” and which we call ”—i”, though this has little bearing on any applications of complex 
numbers. 
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160.9 indeterminate form 


The expression 


ols 
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is known as the indeterminate form. The motivation for this Mame) is that there are no 
rules for comparing the value of g to the other Note that, for example, t is 
not indeterminate, since we can justifiably [associatelit with +00, which does compare with 
the rest of the real numbers (in particular, it is defined to be greater than all of them.) 


Although c is called “the” indeterminate form, another indeterminate form is 


8 | 8 


for the same motivating reasons. 
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160.10 inequalities for real numbers 


Suppose a and b are|real numbers! Then we have four |types| of inequalities between a and b: 


1. The inequality a < b means that a — b is negative. 
2. The inequality a > b means that a — b is positive. 
3. The inequality a < b means that a — b is non-positive. 


4. The inequality a > b means that a — b is non-negative. 


The first two inequalities are also called a strict inequalities. 


Properties 


Suppose a and b are real numbers. 


1. Ifa >b, then —a < —b. Ifa < b, then —a > —b. 
2. Ifa > b, then —a < —b. If a < b, then —a > —b. 


3. Suppose ao, a1,... is a of real numbers converging to a, and suppose that 
either a; < b or a; < b for some real number b for each 7. Then a < b. 
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Examples 


1. The triangle inequality, If a,b,c are real numbers, then 


la — c| — |b-el| < ja — b| < |a — c| + [b — cl. 


2. [Jordan's Inequality 
3. [Young's mequality 
4. |Bernoulli’s inequality 
5. Nesbitt’s inequality 
6. Shapiro mequality 


Inequalities for sequences 


L. 
2. 

3. 

_rithmetie-geometrie-harmonic means inequalify|and the general means Mequality 
6. 

T. 


a 


Or 


Geometric inequalities 


1. Hadwiger-Finsler inequality 
2. \Weizenbock’s Inequality 
3. |Brunn-Minkowski inequality 


Matrix inequalities 


1. Shur’s inequality 
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160.11 interval 


Loosely speaking, an interval is a part of the real numbers] that start at one number and 
stops at another number. For instance, all numbers greater that 1 and smaller than 2 form 
in interval. Another interval is formed by numbers greater or equal to 1 and smaller than 
2. Thus, when talking about intervals, it is necessary to specify whether the endpoints are 
part of the interval or not. There are then four|types| of intervals with three different mames 
open [closed|and half-open. Let us next define these precisely. 


1. The open interval [contains] neither of the endpoints. If a < b are real numbers, then 
the open interval of numbers between a and b is written as (a,b) and 


(a,b) ={xrER]|a<z<b}. 


2. The closed interval contains both endpoints. Ifa < b are real numbers, then the closed 
interval is written as |a, b] and 


[a,b] = {rER]|a<zx<b}. 


3. A half-open interval contains only one of the endpoints. If a < b are real numbers, the 
half-open intervals (a, b] and |a,b) are defined as 
(a,b) = {rER|a<zr<by, 
a,b) = {rE R|ax<z< bd}. 


Infinite intervals 


If we allow either (or both) of a and b to be|infinite, then we define 


(aœ) = {reER|e> al}, 

[a,co) = {xER| 22> a}, 
(—oo,a) = {xER|z <a}, 
(-oo,a] = {rE R|z <a}, 
(—-œ,0œ) = R. 


Note on naming and notation 


In [B], an open interval is always called a segment, and a closed interval is called simply 
an interval. However, the above naming with open, closed, and half-open interval seems to 
be more widely adopted. See e.g. BIA [2]. To distinguish between |a, b) and (a, b], the former 
is sometimes called a right half-open interval and the latter a left half-open interval 
[6]. The notation (a,b), [a, b), (a, b], [a,b] seems to be standard. However, some authors 
(especially from the French school) use the notation Ja, b|, [a, b[, Ja, b], [a,b] as opposed to 
(a,b), la, b), (a, b], [a, b]. 


742 


REFERENCES 


. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976. 

. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987. 

. R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd., 3rd ed., 1995. 

. L. Rade, B. Westergren, Mathematics Handbook for Science and Engineering, Stu- 
dentlitteratur, 1995. 

. R.A. Silverman, Introductory Complex Analysis, Dover Publications, 1972. 

. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathe- 
matical Society, 1998. 


Ae Ne 


O ot 


Version: 2 Owner: mathcam Author(s): matte 


160.12 modulus of complex number 


Definition Let z be a\complex number] and let Z be the [complex conjugate of z. Then the 


modulus, or absolute value, of z is defined as [I] 


z = ez. 


If we write z in polar form as z = re? with r > 0,¢ € (0,27), then |z| = r. It follows 
that the modulus is a positive [real number! or zero. Alternatively, if a and b are the real 


respectively imaginary) parts of z, then 
Jz] = Væ +e, (160.12.1) 
which is simply the Euclidean norm) of the point (a,b) € R?. It follows that the modulus 


lsatisfies| the triangle inequality] i.e., {property| 2 below. Other properties of the modulus are 
as follows [I] B]: If u,v € C, then 


1. |u| > 0, with |u| = 0 if and only if u = 0. 
2. Ju +v] < [ul + fvl. 

3. |uv| = |ul|o. 

4. For any n = 1,2,..., we have that |u”| = |u|”. 
5. Ifv £0, then |u/v| = |ul/|o}. 

6. |u| = [ul]. 

7. |u| is a\strictly increasing function of | Re{u}| and | Im{u}}. 

Property 3 and 6 follows by writing u and v in polar form (see e.g. 2]). Property 5 follows 
from property 3 and the identity) 1/u = %/|u|?. Indeed, 


[uu/o| = |uo/|ol*| = lullo? = fel/let- 


743 


REFERENCES 


1. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed. 
2. E. Weisstein, Eric W. Weisstein’s world of mathematics, 


Version: 8 Owner: matte Author(s): matte 


160.13 proof of fundamental theorem of algebra 


If f(x) € Cz] let a be afroot|of f(x) in some \extension| of C. Let K be a|Galois closure) of 
C(a) over R and set G = Gal(K/R). Let H be a Sylow 2-subgroup of G and let L = K” (the 
fxab-fldof H in K). By the Fundamental Theorem of Galois Theory we bave [L : R] = 
IG: H], anlodd number! We may write L = R(b) for some b € L, so the minimal polynomial) 
Mpp(2) isi eļover R and of odd{degree| That degree must be 1, and hence L = R, 
which means that G = H, a 2-group. Thus G, = Gal(K/C) is also a 2-group. If Gi 4 1 
choose Gy < G, such that [G1 : G2] = 2, and set M = K®?, so that [M : C] = [G, : Ga] = 2. 
But any [polynomial of degree 2 over C has roots in C by the so such a 
lfield) M cannot exist. This contradiction shows that G; = 1. Hence K = C anda € C, 
completing the proof. 
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160.14 proof of the fundamental theorem of algebra 


Let f : C — C be a polynomial, and suppose f has no|rootlin C. We will show f is\constant) 
Let g = F Since f is never zero, g is defined and [holomorphic] on C (ie. it is [entire). 


Moreover, since f is a polynomial, |f(z)| — œ as |z| — œœ, and so |g(z)| — 0 as |z| = oo. 
Then there is some M such that |g(z)| < 1 whenever |z| > M, and g is [continuous] and so 
bounded] on the [compact set] {z € C : |z| < M}. 


So g is bounded and entire, and therefore by Liouville’s theorem) g is constant. So f is 
constant as required. 
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160.15 real and complex embeddings 


Let L be a [subfield] of C. 
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Definition 9. 


1. A real embedding of L is an 
a: LOR 


2. A (non-real) complex embedding of L is an injective field homomorphism 


Ti: LGC 


such that 7(L) É R. 


3. We denote Xz the set of all embeddings, [reall and [complex] of L in C (note that all of 
them must [fix] Q, since they are field homomorphisms). 


Note that if ø is a real embedding then o = a, where ~ denotes the complex conjugation 


CoC, (a+bi)=a-bi 
On the other hand, if T is a complex embedding, then 7 is another complex embedding, so 


the complex embeddings always come in pairs {7,7}. 


Let K C L be another subfield of C. Moreover, assume that [L : K] is (this is the 
dimension! of L as alvector space| over K). We are interested in the embeddings of L that fix 
K {pointwise} i.e. embeddings y: L —> C such that 


Ylk)=k, VkREK 


Theorem 15. For any embedding w of K in C, there are exactly |L : K| embeddings of L 
such that they extend w. In other words, if p is one of them, then 


plk) = 4(k), Vkek 
Thus, by taking ù = Idg, there are exactly |L : K| embeddings of L which fix K pointwise. 


Hence, by the theorem, we know that the [order] of Xz is [L : Q]. The number [L : Q] is 
usually decomposed as 
IL: Q] = rı + 2rə 


where rı is the number of embeddings which are real, and 2r is the number of embeddings 
which are complex (non-real). Notice that by the remark above this number is always even, 


so r2 is an [integer] 


Remark: Let Y be an embedding of L in C. Since w is injective, we have w(L) = L, so we 
can regard y as an automorphism of L. When L/Q is a[Galois extension, we can prove that 
Xz = Gal(L/Q), and hence proving in a different way the fact that 


| Sz |= [L: Q] =| Gal(L/Q) | 
Version: 1 Owner: alozano Author(s): alozano 


745 


160.16 real number 


There are several definitions of real number, all in common use. We give one 
definition in detail and mention the other ones. 


A|Cauchy sequence] of rational numberslis ajsequence|{x;}, i =0,1,2,... ofrational numbers 
with the property that, for every rational number € > 0, there exists anatural number! N such 
that, for all natural numbers n,m > N, the absolute value] |r, — £m] |En —Im| < €. 


The set R of real numbers is the set of of Cauchy sequences of rational 


numbers, under the {xi} ~ {yi} if the [interleave sequence] of the two 
sequences is itself a Cauchy sequence. The real numbers form a with addition and 


multiplication defined by 


e {xi} + {ui} = {(tit+ y:)} 
© {xi}: {yi} = {(2:; yi)} 


There is an on R, defined by {z;} < {y;} if either {x;} ~ {yi} or there 


exists a natural number N such that £n < Yn for all n > N. 


One can prove that the real numbers form anlordered fieldland that they satisfy the|least upper bound proper 


For every nonempty [subset] S C R, if S has anupper bound|then S has allowest upper bound 
It is also true that every ordered field with the least upper bound property is to 


R. 


Alternative definitions of the set of real numbers include: 


1. Equivalence classes of decimal sequences (sequences consisting of natural numbers be- 
tween 0 and 9, and a single decimal point), where two decimal sequences are equivalent 
if they are identical, or if one has an [infinite] tail of 9’s, the other has an infinite tail of 
0’s, and the leading portion of the first sequence is one lower than the leading portion 
of the second. 


2. [Dedekind cuts! of rational numbers (that is, subsets S of Q with the property that, if 
a € S and b < a, then b € S). 


3. The real numbers can also be defined as the unique (up to isomorphism) ordered field 


satisfying the least upper bound property. 


The real numbers are often described as the unique (up to isomorphism) [complete] ordered 
field. While this fact is true, care must be taken when using it for the definition, because 
the standard definition of ”complete” is logically dependent on the notion of real number. 
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160.17 totally real and imaginary fields 


For this entry, we follow the notation of the entry real and complex embeddings 


Let K be a\subfield of the complex numbers, C, and let Ux be the set of all [embeddings] of 
im. C. 


Definition 10. With K as above: 


1. K is a totally real field if all embeddings Y € Ux are real embeddings) 
2. K isa totally imaginary field if all embeddings Y% € Ux are (non-real) (complex embeddings) 


3. K is a CM-field or complex multiplication field if K is a totally 
lextension| of a totally real field, i.e. K is the extension obtained from a totally real 
field by adjoining the square root| of a number all of whose are negative. 


Note: A complex number w is|reallif and only if ©, the complex conjugate) of w, equals w: 


wERSw=W 


Thus, a [field] K which is by complex conjugation is totally real. Given a 


field L, the subfield of L fixed pointwise by complex conjugation is called the maximal real 
subfield of L. 


For examples (of (1), (2) and (3)), see examples of totally real fields 
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Chapter 161 


12E05 — Polynomials (irreducibility, 
etc.) 


161.1 Gauss’s Lemma I 


There are a few different things that are sometimes called Gauss’ lemma’. See also|Gauss’s Lemma IT 


Gauss’s Lemma I: If R is a and f(x) and g(x) are both primitive in R[x], so is 
f(w)g(@). 


Proof: Suppose f(x)g(x) not primitive. We will show either f(x) or g(x) isn’t as well. 
f(x)g(x) not primitive means the [gcd] of the coefficients of f(x)g(z) is not a [unit] Let p 
be a [primel]lfactor] of that ged. We consider the [image] of R mod p - i.e. under the natural] 


0 : R — R/pR - and extend to the 
Since R is an R/pR is an integral domain, so (R/pR)|z| is an integral 


domain. And we have 


(x) g(x) =0 


Where f(x) is the image of f(x) in (R/pR)[z], similarly g(x). So f(x) =0 or g(x) = 0. So 
f(x) or g(x) is divisible by p, so one of them is not primitive. 
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161.2 Gauss’s Lemma II 


Definition:A polynomial p(x) = anz” + ... + ao over a R is said to be primitive if its 
coefficients are not all divisible by any element of R other than alunit 


(Gauss): Let R bea UFD and F its field_of fractions} If a polynomial p € R[x] 
is reducible|in F [x], then it is reducible in R[z]. 


Proof:We may assume that p is primitive. Suppose p = qr with q,r € Fz]. There are unique 
elements a,b € F such that g/a and r/b are in R[2] and are primitive. But p/ab = (q/a)(r/b). 
Since p is primitive, it follows from {Gauss’s Lemma I| that ab is a unit, and therefore so are 
a and b. This completes the proof. 


Remark: Another result with the same name is Gauss’ lemma on quadratic residues 
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161.3 discriminant 


Summary. The discriminant of a given polynomial is a number, calculated from the co- 
efficients of that polynomial, that vanishes if and only if that polynomial has one or more 
multiple|roots| Using the discriminant we can test for the presence of multiple roots, without 
having to actually [calculate] the roots of the polynomial in question. 


Definition. The discriminant of [orderin € N is the polynomial, denoted here by 6 = 
6) (ay,...,@m), characterized by the following [relation! 


5)(s1, 80,.--,5n) =|] [] (i -2;)?, (161.3.1) 


where 
Sp = Sp Tean) k= lang 


is the k* elementary symmetric polynomia. 


The above relation is a defining one, because the right-hand side of (1) is, evidently, a (1) 
symmetric (1) polynomial, and because the (1) of symmetric polynomials is freely 
generated by the basic symmetric polynomials, i.e. every symmetric polynomial arises in a 


unique fashion as a polynomial of s!,...,s”. 


Proposition 1. Up to sign, the discriminant is given by the \determinant of an 2n — 1 
with 1 to n— 1 formed by shifting the sequence 1,ai,..., an, and 
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columns n to 2n — 1 formed by shifting the sequence n, (n — 1)ay,...,Qn—1, ie. 


1 0 0 n 0 0 
ay 1 0 (n-1)q 0 
ag ay... 0 n—2)a n— l)a ... 0 
piel E ( 2) 2 ( 4 n (161.3.2) 
0 0 An—1 0 0 2 An—2 
0 0 Qn 0 0 An-1 


Multiple root test. Let K be a field) let x denote an indeterminate, and let 


p=" tae" +S eae to, tee KR 
be a|monic polynomial] over K. We define [p], the discriminant of p, by setting 


d[p] = 5”) (ai, ..-, an). 


The discriminant of a non-monic polynomial is defined homogenizing the above definition, 
i.e by setting 
dlap] = a ?ôjp], ae K. 


Proposition 2. The discriminant vanishes if and only if p has multiple roots in its splitting field 


Proof. It isn’t hard to show that a polynomial has multiple roots if and only if that polynomial 
and its [derivative] share a common root. The desired conclusion now follows by observing 
that the determinant in equation (1613.2) gives the [resolvent] of a polynomial and 
its derivative. This resolvent vanishes if and only if the polynomial in question has a multiple 
root. Q.E.D. 


Some Examples. Here are the first few discriminants. 


ôt =1 

6°) = a? — day 

6°) = 18 a,aga3 + ata? — 403 — 4a3a3 — 2702 

64) = a®atat — 4 a3a2 — 4 0303 + 18 ajaa? — 27 aś 
—4 a?aža4 +16 asa +18 a2a20304 — 80 1030304 
— 6ataza, + 144 agaza, — 27 aĵa? + 144 azanay 
— 128 ada; — 192 a,a3a4 + 256 a3 
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Here is the [matrix] used to calculate 6“): 


1 0 0 4 0 0 0 
a 1 0 3q 4 #O 0 
dg a, 1 2a, 38a, 4 0 
64) = | a3 ag a, a3 2a 304 4 
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161.4 polynomial ring 


Let R be alring! The polynomial ring over R in one variable X is the set R[X] of all sequences} 
in R with only finitely many nonzero|terms) If (ao, a1, a2, @3,...) is an element in R|X], with 
an = 0 for all n > N, then we usually write this element as 


N 
> a = ao +01 X +e ax aX 


n=0 


Addition and multiplication in R|X] are defined by 


N N N 
NO nX” +Y nX" = X (ant bn)X” (161.4.1) 
a a TE 

Yak S m= Y (> nhs) x" (161.4.2) 
n=0 n=0 n=0 k=0 


R|X] is a ring under these operations. 


The polynomial ring over R in two variables X,Y is defined to be R[X,Y] := R[X][Y]. In 
three variables, we have R[X, Y, Z] := R[X,Y][Z] = R[X][Y][Z], and in any {finite number of 
variables, we have inductively R[X1, X2,..., Xn] := RIX, ...,Xn—1][Xn] = RIX JX] e [Xn]. 
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161.5 resolvent 


Summary. The resolvent of two polynomials] is a number, calculated from the coefficients 
of those polynomials, that vanishes if and only if the two polynomials share a common |root| 
Conversely, the resolvent is non-zero if and only if the two polynomials are mutually [prime] 
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Definition. Let K be affield and let 


p(z) = aoz” + az”! +... + an, 
qlz) = bot™ + bye”! +... + bm 


be two polynomials over K of [degree] n and m, respectively. We define Res[p,q] € K, the 


resolvent of p(x) and q(x), to be the [determinant] of a n + m [square matrix] with [columns] 1 
to m formed by shifted pequences] consisting of the coefficients of p(x), and columns m + 1 
to n+ m formed by shifted sequences consisting of coefficients of q(x), i.e. 


ao 0 0 bo 0 0 

a, ao 0 bi bo 0 

ag ay 0 bə bi 0 
Reipg=|. a a F lp o 

0 O esz An—1 0 O ... Dii 

0 0 an 0 0 bm 


Proposition 3. The resolvent of two polynomials is non-zero if and only if the polynomials 


are relatively prime 


Proof. Let p(x),q(x) € K[z] be two arbitrary polynomials of degree n and m, respectively. 
The polynomials are relatively prime if and only if every polynomial — including the 
polynomial 1 — can be formed as a [linear combination] of p(x) and q(x). Let 


=j = 
cot Hy * +... + Cmi, 


dox”! + bix? abe dn—1 


r(x) 
s(x) 


be polynomials of degree m — 1 and n — 1, respectively. The coefficients of the linear 
combination r(x)p(x) + s(x)q(x) are given by the following matrix-vector multiplication: 


Co 

Ci 

ao O ace © b0... 0 C2 
a, ao... 0 b b ... 0 : 
a a... 0 b b 0 Cm1 
n 4 do 

0 0 anı 0 0 bm—i| | 4 
0 0 an 0 0 bm də 
dn—1 


In consequence of the preceding remarks, p(x) and q(x) are relatively prime if and only if 
the matrix] above is non-singular, i.e. the resolvent is non-vanishing. Q.E.D. 
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Alternative Characterization. The following describes the resolvent of two 
polynomials in {terms] of the polynomials’ roots. Indeed this uniquely characterizes 
the resolvent, as can be seen by carefully studying the appended proof. 


Proposition 4. Let p(x), q(x) be as above and let x1, ..., £n and yı, ..-, Ym be their respective 


roots in the algebraic closure of K. Then, 


Res|p, q] = aq’ bo I] [[@ — yj) 


i=1 j=1 


Proof. The property of determinants implies that 


1 | ... 0 1 0 ... 0 
A 1... 0 B 1... 0 
an Ao Ay oe 0 Bo Bı dga 0 
Res[p,q)=ag bo] 2... Se 4 
0 0 Agi 0 0 Bm-1 
where 
A; = a = 1, isie Mla 
ag 
oa". jg=l,...m. 
bo 


It therefore suffices to prove the proposition for [monic polynomials, Without loss of gener- 
ality we can also assume that the roots in question are algebraically independent 


Thus, let X1,...,Xn,¥1,.--,; Ym be indeterminates and set 


F(X1,...,XnVi,-.-5¥m) = [] [[QG-%) 


P(x) = (x — X1)... (£ — Xn), 
Q(x) = (x — Y1)... (£ — Ym), 
G(X1,--., Xn, Yi,- -, Ym) = Res|P, Q] 


Now by Proposition 1, G vanishes if we replace any of the Y1,..., Ym by any of X1,...,Xn 
and hence F |divides) G. 


Next, consider the main diagonal of the matrix whose determinant gives Res|P, Q]. The first 
m entries of the diagonal are equal to 1, and the next n entries are equal to (—1)"Y1... Ym. 
It follows that the expansion of G (contains) a term of the form (—1)””"Y;"...Y,". However, 
the expansion of F contains exactly the same term, and therefore F = G. Q.E.D. 
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161.6 de Moivre identity 


From the Euler relation 


e” = cosb + isin (161.6.1) 

it follows that 
er = (e) (161.6.2) 
cos nô + isin n = (cos + isin 0)”. (161.6.3) 


This is called de Moivre’s formula, and besides being generally useful, it’s a convenient 
way to remember double- (and higher-multiple-) angle formulas, For example, 


cos 20 + isin 20 = (cos@ + isin 0)? = cos? 0 + 2i sin 0 cos 6 — sin? 6. (161.6.4) 


Since the and {reall parts on each side must be equal, we must have 


cos 20 = cos” 6 — sin? 6 and (161.6.5) 
sin 20 = 2 sin 0 cos 0. (161.6.6) 
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161.7 monic 


A monic polynomial is a polynomial with a leading coefficient of 1. That is, if P,(a) isa 
polynomial of lorder|n in the variable x, then the coefficient of x” in P,,(z) is 1. 


For example, x° + 3x — 10x +1 is a monic 5th-order polynomial. 32? + 2z—5 is a 2nd-order 
polynomial which is not monic. 
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161.8 Wedderburn’s Theorem 


A [finitaldivision ring isa ta 


One of the many consequences of this theorem is that for a finite projective plane, Desargue’s 
Theorem implies Pappus’ theorem 
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161.9 proof of Wedderburn’s theorem 


We want to show that the multiplication operation in a finite division ring) is [abelian] 
We denote the |centralizer|in D of an element x as Cp(z). 


emma, The centralizer is a subring 


0 and 1 are|obviously|elements of Cp(x) and if y and z are, then z(—y) = —(xy) = —(yz) = 
(—y)z, z(y + z) = zy + zz = ye + zx = (y + z)x and z(yz) = (xy)z = (yx)z = y(zz) = 


y(zx) = (yz)x, so —y, y+ z, and yz are also elements of Cp(x). Moreover, for y # 0, zy = yx 


implies y™tz = ry, so y™t is also an element of Cp(x). 


Now we consider the[centerlof D which we’ll call Z(D). This is also a subring and is in fact 
the [intersection] of all centralizers. 


= (| Co(2) 


xED 


Z(D) is an abelian subring of D and is thus alfield| We can consider D and every Cp(z) 


as [vector spaces] over Z(D) of|dimension|n and n, respectively. Since D can be viewed as a 
over C'p(x) we find that na divides) n. If we put q := |Z(D)|, we see that q > 2 since 
{0,1} C Z(D), and that |Cp(x)| = q”* and |D| = q”. 


It suffices to show that n = 1 to prove that multiplication is abelian, since then |Z(D)| = |D| 
and so Z(D) = D. 


We now consider D* := D — {0} and apply the 


ID*| = |2(D")| + IID" : Cole 


which gives 


By Zsigmondy’s theorem) there exists a/prime p that divides q” — 1 but doesn’t divide any of 


the q” — 1 for 0 < m < n, except in 2 pathological cases which will be dealt with separately. 
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Such a prime p will divide q” — 1 and each of the i. So it will also divide q — 1 which 
can only happen if n = 1. 


We now deal with the 2 exceptional cases. In the first case n equals 2, which would mean D is 
a vector space of dimension 2 over Z(D), with elements of the form a+ba where a,b € Z(D). 
Such elements clearly commute so D = Z(D) which contradicts our assumption that n = 2. 
In the second case, n = 6 and q = 2. Thelclass|equation reduces to 64—1 = 2—1+ Xo 2 
where n, divides 6. This gives 62 = 63x + 2ly + 9z with x,y and z which is 
impossible since the right hand side is divisible by 3 and the left hand side isn’t. 
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161.10 second proof of Wedderburn’s theorem 


We can prove[Wedderburn Theorem|without using Zsigismondy’s theorem on the 
of the first proof; let Gn set of n-th {roots of unity) and P,, set of n-th primitive roots of unity 


and ®4(q) the d-th cyclotomic polynomial 


It a 


e (q) = Teen (= £) 
e pla) =" —1= [eee (@—§) = Ian ala) 
e (q) € < 4 it’s unitarian and ®„(q) |q” — 1 


by last two previous it results: 
Pala) |g” —1, ®n(q) | | = ®,(q)|q-1 


because ®,,(q) [divides] the left and each addend of X 
conjugacy class formula. 
By third property 


of the right member of the 


2 oat 


q>1, n(x) € Zz] > lq) E Z > |®,(g)| |q-1> |a| <aq-1 


If,for n > 1,we have |®,,(q)| > q — 1,then n = 1 and the theorem is proved. 
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We know that 


[$al = [[ la-€|, witha-€EC 


€€Pn 
by the [triangle inequality] in C 
la—€| 2 |lal -— [ll ea] 
as € is a primitive root of unity, besides 
g= = h= 


but 
n>l=>¢é#1 


therefore,we have 
la—€| > lg- 1|=g-1 => |n) >q-1 
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161.11 finite field 


A finite field is a [field| F which has finitely many elements. We will present some basic facts 
about finite fields. 


161.11.1 Size of a finite field 


Theorem 9. A finite field F has positive \characteristid p > 0. The [cardinality] of F is p” 
where n := |F : Fp] and F, denotes the|prime subfield of F. 


T he characteristic of F is positive because otherwise the generated by 
1 would be an of F. Accordingly, the prime subfield F, of F is to 


the field Z/pZ of |integers mod p. Since the field F is an n-dimensional over 


Fp, it is set-isomorphic to F% and thus has cardinality p". 


161.11.2 Existence of finite fields 


Now that we know every finite field has p” elements, it is to ask which of these 
actually arise as cardinalities of finite fields. It turns out that for each p and each 
natural number!n, there is essentially exactly one finite field of size p”. 
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Lemma 2. In any field F with m elements, the equation x™ = x is satisfied by all elements 
x of F. 


T he result is|clearly| true if x = 0. We may therefore assume z is not zero. By definition of 
field, the set F% of nonzero elements of F' forms a [group] under multiplication. This set has 


m — 1 elements, and by Lagrange’s theorem) x”! = 1 for any x € F*, so x” = z follows. 


Theorem 10. For each prime p > 0 and each natural number n € N, there exists a finite 
field of cardinality p”, and any two such are isomorphic. 


F or n= 1, the finite field F, := Z/pZ has p elements, and any two such are isomorphic by 
the map] sending 1 to 1. 


In general, the|polynomial] f(X) := X?" —X € F,[X] has|derivative| —1 and thus is|separable| 
over F,. We claim that the F of this polynomial is a finite field of size p”. 
The field F certainly |contains) the set S of roots) of f(X). However, the set S is closed) under 
the field operations, so S is itself a field. Since splitting fields are minimal by definition, the 
containment S C F means that S = F. Finally, S has p” elements since f(X) is separable, 
so F is a field of size p”. 


For the uniqueness any other field F” of size p” contains a [subfield] isomorphic to Fp: 
Moreover, F” equals the splitting field of the polynomial X”" — X over Fp, since by Jemmal2] 
every element of F’ is a root of this polynomial, and all p” possible roots of the polynomial 


are accounted for in this way. By the uniqueness of splitting fields up to [isomorphism] the 
two fields F and F” are isomorphic. 


Note: The proof of Theorem [0] given here, while standard because of its relies 
on more abstract than is strictly necessary. The reader may find a more concrete 
of this and many other results about finite fields in [I] Ch. 7]. 


Corollary 7. Every finite field F is a\normal extension) of its prime subfield F,. 


T his follows from the fact that field [extensions] obtained from splitting fields are normal 


extensions. 


161.11.3 Units in a finite field 


Henceforth, in light of Theorem [0] we will write F, for the unique (up to isomorphism) 
finite field of cardinality q = p”. A fundamental step in the investigation of finite fields is 


the observation that their multiplicative] groups are cyclic 


Theorem 11. Let F; denote the multiplicative group of nonzero elements of the finite field 


Fy. Then FẸ is a 
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W e begin with the formula! 
5" o(d) =k, (161.11.1) 
d|k 
where ¢ denotes the It is proved as follows. For every d of 


k, the cyclic group Ck of size k has exactly one cyclic subgroup Cg of size d. Let Ga be the 
subset of Cy consisting of elements of C4 which have the maximum possible |orderl of d. Since 


every element of Cy has [maximal order| in the subgroup of C;, that it generates, we see that 
the sets Gy the set C, so that 


> lGa] = |Ce| = k. 


d|k 


The (161.11.1) then follows from the observation that the cyclic subgroup Cy has 
exactly ¢(d) elements of maximal order d. 


We now prove the theorem. Let k = q — 1, and for each divisor d of k, let w(d) be the 
number of elements of Fy of order d. We claim that (d) is either zero or ġ(d). Indeed, if 
it is nonzero, then let x € Fj be an element of order d, and let G, be the subgroup of F% 
generated by x. Then G, has size d and every element of G, is a root of the polynomial 
x? —1. But this polynomial cannot have more than d roots in a field, so every root of xf — 1 
must be an element of G,. In particular, every element of order d must be in G, already, 
and we see that G, only has ¢(d) elements of order d. 


We have proved that w(d) < ¢(d) for all d | q — 1. If Y(q — 1) were 0, then we would have 
Vyd) < Y gld) =4-1, 
d|q—1 d|q—1 


which is impossible since the first sum must equal q — 1 (because every element of F% has 
order equal to some divisor d of q — 1). 


A more constructive proof of Theorem [I] which actually exhibits a/generator] for the cyclic 
group, may be found in [2] Ch. 16]. 


161.11.4 Automorphisms of a finite field 


Observe that, since a splitting field for X1” — X over F, contains all the roots of X1 — 
X, it follows that the field Fm contains a subfield isomorphic to F,. We will show later 
(Theorem [[3) that this is the only way that extensions of finite fields can arise. For now we 


will construct the|Galois groupjof the field extension Fgm/F,, which is\normal] by Corollary [4 


Theorem 12. The Galois group of the field extension Fym/F, is a cyclic group of size m 
generated by the q™ power Frobenius map| Frob,. 


T he fact that Frob, is an element of I'((|Q)Fgm/F,), and that (Frob,)” = Frobgm is the 
identity on Fym, is obvious. Since the extension Fgm/F, is normal and of m, the 
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group P'(({Q)Fgm/F,) must have size m, and we will be done if we can show that (Frob,)*, 
for k = 0,1,...,m-—1, are distinct elements of P((|Q)Fgm/F,). 


It is enough to show that none of (Frob,)*, for k = 1,2,...,m—1, is the identity map| on 
Fam, for then we will have shown that Frob, is of order exactly equal to m. But, if any such 


(Frob,)* were the identity map, then the polynomial X 4 — X would have g™ distinct roots 
in Fgm, which is impossible in a field since që < q”. 


We can now use the between subgroups of the Galois group and 


intermediate fields of a field extension to immediately classify all the intermediate fields in 
the extension Fgm/Fq. 


Theorem 13. The field extension Fgm/F, contains exactly one intermediate field isomorphic 
to F a, for each divisor d of m, and no others. In particular, the subfields of F are precisely 
the fields Fpa for d|n. 


B y the fundamental theorem of Galois theory, each intermediate field of Fym/F, corre- 
sponds to a subgroup of [((|Q)Fgm/F,). The latter is a cyclic group of order m, so its 
subgroups are exactly the cyclic groups generated by (Frob,)%, one for each d | m. The 
lfixed field) of (Frob,)? is the set of roots of X“ — X , which forms a subfield of Fgm isomor- 
phic to F,a, so the result follows. 


The subfields of F, can be obtained by applying the above considerations to the extension 
Fon] Fq. 
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161.12 Frobenius automorphism 
Let F be a/field| of characteristic| p > 0. Then for any a,b € F, 
(a+b)? = aœ +b, 


(ab)? = ab. 


Thus the map] 


is an linjective| field [automorphism] called the Frobenius automorphism, or simply the Frobe- 


nius map on F. 


Note: This morphism is sometimes also called the “small Frobenius” to distinguish it from 
the map a +> a’, with q = p”. This map is then also referred to as the “big Frobenius” or 
the “power Frobenius map”. 
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161.13 characteristic 


Let (F,+,-) be alfield) The characteristic Char(F) of F is commonly given by one of three 
definitions: 


e if there is some positive [integer| n for which the result of adding 1 to itself n times 
yields 0, then the characteristic of the field is the least such n. Otherwise, Char( F) is 
defined to be 0. 


e if f: Z — F is defined by f(n) = n.1 then Char(F’) is the least strictly positive 
of ker(F’) if ker(F’) # {0}; otherwise it is 0. 


e if K is the prime subfield of F, then Char(F) is the |sizel of K if this is finite) and 0 


otherwise. 


Note that the first two definitions also apply to arbitrary |rings| and not just to fields. 


The characteristic of a field (or more generally an [integral domain) is always |prime, For if 
the characteristic of F were say mn for m,n > 1, then in particular mn would 


equal zero. Then either m would be zero or n would be zero, so the characteristic of F would 
actually be smaller than mn, contradicting the minimality condition. 
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161.14 characterization of field 


Let R40 bea with identity 


Proposition 6. The {ring R (as above) is a [field] if and only if R has exactly two lideals: 
(0), R. 
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(=) Suppose & is a field and let A be a non-zero ideal of R. Then there exists re ACR 
with r Æ 0. Since R is a field and r is a non-zero element, there exists s € R such that 


sr=1ER 


Moreover, A is an ideal, r € A, s € 8, so s-r =1 € A. Hence A = R. We have proved that 
the only ideals of R are (0) and R as desired. 


(<) Suppose the ring R has only two ideals, namely (0), R. Let a € R be a non-zero 
element; we would like to prove the existence of a multiplicative inverse) for a in R. Define 


the following set: 
A=(a)={rER|r=s-a, for some s € R} 


This is an ideal, the ideal generated by the element a. Moreover, this ideal is not the 
[Zero ideali because a € A and a was assumed to be non-zero. Thus, since there are only two 
ideals, we conclude A = R. Therefore 1 € A = R so there exists an element s € R such that 


s-a=1ER 
Hence for all non-zero a € R, a has a multiplicative inverse in R, so R is, in fact, a field. 
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161.15 example of an infinite field of finite character- 
istic 


Let K be alfield of|finite characteristic, such as F,. Then the ring) of|polynomials, K(X], is an 


integral domain, We may therefore construct its quotient field, namely the field of 
polynomials. This is an example of an infinitel field with finite characteristic. 


Version: 1 Owner: vitriol Author(s): vitriol 


161.16 examples of fields 


[Fields| are typically sets of “numbers” in which the arithmetic operations of addition, sub- 
traction, multiplication and division are defined. Another important class of fields are the 
[function] fields defined on geometric objects such as varieties or Riemann surfaces. 


The following is a list of examples of fields. 


e The rational numbers|Q, the|real numbers/R and the C are the most 


familiar examples of fields. 
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e Slightly more exotic, the hyperreal| numbers and the are fields con- 
taining infinitesimal] and infinitely large numbers. (The surreal numbers aren’t a field 


in the strict sense since they form a [proper class and not a set.) 
° Theļalgebraic numbers} form a field; this is the algebraic closure) of Q. In general, every 


field has an (essentially unique) algebraic closure. 


e The |computable complex numbers (those whose digit Sequence) can be produced by a 
Turing machine) form a field. The definable|complex numbers (those which can be pre- 


cisely specified using a logical formula) form a field containing the computable numbers 
arguably, this field contains all the numbers we can ever talk about. It is countable} 


e The so-called “number fields” arise from Q by adding some algebraic numbers. For 
instance Q(/2) = {u+vv2 | u,v € Q} and Q(V/2,1) = {u + vi + wV724 zi V2 + 
yW44+ ziv/4 | u,v, w, x,y,z € Q}. 


e If pis a [prime number, then the p-adic |rationals form a field Qp- 


e If pis a prime number, then the |integers| modulo p form alfinite field with p elements, 


typically denoted by F,. More generally, for every p” there is one and 
only one finite field Fp» with p” elements. 


e If K is a field, we can form the field of {rational functions over K, denoted by K(X). 
It consists of quotients of polynomials in X with coefficients in K. 


elfVisa over the field K, then the function field of V, denoted by K(V), 
consists of all quotients of polynomial functions defined on V. 


e If U is a domain (=[connected]open set) in C, then the set of all{meromorphic] functions 


on U is a field. More generally, the meromorphic functions on any Riemann surface 
form a field. 


e The field of over the field K in the variable X consists of all 


expressions of the form 
(oe) 


> a; X’ 


j=-M 
where M is some integer and the coefficients a; come from K. 
e More generally, whenever R is an integral domain, we can form its field_of fraction, a 


field whose elements are the fractions of elements of R. 
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161.17 field 


A fieldis a F with identity such that: 


e 140 


e Ifa € F, anda # 0, then there exists b € F with a-b = 1. 
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161.18 field homomorphism 


Let F and K be 'fields| 
Definition 11. A field homomorphism is alfunctionl y: F — K such that: 


1. y(a +b) = yY(a) + V(b) for all a,b € F 
2. w(a-b) = y(a) -y(b) for all a,b € F 
3. W)=1, Y0) =0 


If 4 is injective] and [surjective] then we say that ~ is a field isomorphism. 
Lemma 3. Lety: F — K be a field homomorphism. Then w is injective. 


I ndeed, if w is a field homomorphism, in particular it is a ring homomorphism, Note that 
the of a ring homomorphism is an and a field F only has two ideals, namely 


{0}, F. Moreover, by the definition of field homomorphism, (1) = 1, hence 1 is not in the 


kernel of the map, so the kernel must be equal to {0}. 


Remark: For this reason the “field homomorphism” and “field monomorphism” are 


synonymous. Also note that if w is a field monomorphism, then 
PFS F, pF)CK 
so there is a “copy” of F in K. In other words, if 


y: F— K 


is a field homomorphism then there exist alsubfield| H of K such that H ~ F. Conversely, 
suppose there exists H C K with H [isomorphic|to F. Then there is an [isomorphism] 


x: F—>H 
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and we also have the inclusion homomorphism 
u H> K 


Thus the composition 
Lox: F> K 


is a field homomorphism. 
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161.19 prime subfield 


The prime subfield of afield) F is the intersection] of all subfields] of F, or equivalently the 
smallest subfield of F. It can also be constructed by taking the {quotient field] of the ladditivel 
subgroup) of F generated by the 1. 

If F has[characteristiclp where p > 0 is a[prime] then the prime subfield of F is [isomorphic] 
to the field Z/pZ of mod p. When F has characteristic zero, the prime subfield of 
F is isomorphic to the field Q of 
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Chapter 162 


12F05 — Algebraic extensions 


162.1 a finite extension of fields is an algebraic exten- 
sion 


Theorem 16. Let L/K be a Then L/K is an\algebraic eatension| 


I norder| to prove that L/K is an algebraic extension, we need to show that any element 
a € L is\algebraic| i.e., there exists a non-zero p(x) € K [x] such that p(a) = 0. 


Recall that L/K is a finite extension of fields, by definition, it means that L is alfinite dimensional] 
[vector space|over K. Let the [dimension] be 
|L: K|]=n 


for some n € N. 


Consider the following set of “vectors” in L: 
§ = {1,a,07, a°,...,a"} 
Note that the [cardinality] of S is n+ 1, one more than the dimension of the vector space. 


Therefore, the elements of S must be linearly dependent over K, otherwise the dimension of 
S would be greater than n. Hence, there exist k; E€ K, 0 <i <n, not all zero, such that 


ko + kia + koa? + ksa? +... + kna” = 0 


Thus, if we define 
P(X) = ko + ky X + kX? + k3 X’ +... + ky X" 
then p(X) € K[X] and p(a) = 0, as desired. 


NOTE: The converse is not true. See the entry “algebraic extension” for details. 
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162.2 algebraic closure 


An Lofa K is an algebraic closure of K if L is algebraically closed and 
every element of L is [algebraic] over K. 


Any two algebraic closures of K are [isomorphic] as fields, but not necessarily canonically. 
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162.3 algebraic extension 


Definition 12. Let L/K be an [extension] of lfields| L/K is said to be an algebraic exten- 
sion of fields if every element of L is over K. 


Examples: 


1. Let L = Q(V2). The extension L/Q is an algebraic extension. Indeed, any element 
a € L is of the form 
a=qt iV2eL 


for some g,t € Q. Then a € L is alroot| of 


AP — QX bg -W = 0 


2. The field extension R/Q is not an algebraic extension. For example, m € R is a 


‘transcendental number] over Q (see [pi). 
3. Let K be a field and denote by K thelalgebraic closure) of K. Then the extension K/K 


is algebraic. 


4. In general, a finite extension of fields is an algebraic extension, However, the converse 


is not true. The extension Q/Q is far from finite} 
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162.4 algebraically closed 


A K is algebraically closed if every non-constant polynomial in K[X] has a in K. 
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162.5 algebraically dependent 


Let L be an algebraic field extension of a K. Two elements a, @ of L are algebraically 
dependent if there exists a non-zero polynomial] f (x,y) € K{x,y] such that f(a, 6) = 0. If 


no such polynomial exists, a and ( are said to be algebraically independent. 
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162.6 existence of the minimal polynomial 


Proposition 7. Let K/L be alfinite extension) of fields| and letk € K. There exists a unique 
polynomial m(x) € L{x] such that: 

1. mg(e) is almonie polynomial 

3. If p(x) € L[x] is another polynomial such that p(k) = 0, then m,(x) [divided p(x). 


W e start by defining the following {map} 
y: Lle] > K 


Ylp(z)) = p(k) 
Note that this map is [clearly] a [fring] homomorhism. For all p(x), q(x) € Lfæ]: 


e W(p(x) +qlx)) = p(k) +q(k) = V(p(2)) + V(a(2)) 
e (p(x) - q(x)) = p(k) - g(k) = W(p(z)) - Y(a(2)) 


Thus, the kernel] of y is an lideall of L[x]: 
Ker() = {p(x) € Ela] | p(k) = 0} 


Note that the kernel is a non-zero ideal. This fact relies on the fact that K/L is a finite 


extension of fields, and therefore it is an algebraic extension, so every element of K is a 


of a non-zero polynomial p(x) with coefficients in L, this is, p(x) € Ker(w). 


Moreover, the ring of polynomials L[z] is a (see example of PID). 
Therefore, the kernel of w is a principal ideal, generated by some polynomial m(x): 


Ker(w) = (m(2)) 
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Note that the only in L|a] are the (constant! polynomials, hence if m/(x) is another 
of Ker(y) then 


m'(x)=I-m(x), 140, LEL 


Let a be the leading coefficient of m(x). We define m(x) = a~'m(x), so that the leading 
coefficient of m; is 1. Also note that by the previous remark, mx is the unique generator of 
Ker(w) which is monic. 


By construction, m;(k) = 0, since mą belongs to the kernel of w, so it (2). 


Finally, if p(x) is any polynomial such that p(k) = 0, then p(x) € Ker(w). Since my generates 
this ideal, we know that mą must divide p(x) (this is {property| (3)). 


For the uniqueness, note that any polynomial satisfying (2) and (3) must be a generator of 
Ker(w), and, as we pointed out, there is a unique monic generator, namely m(x). 
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162.7 finite extension 
Let K an lextension field) of F. We say that K is a finite extension if |K : F] is 
That is, K is alfinite dimensional] space over F. 


An important result on finite extensions establishes that any finite extension is also an 


algebraic extension 
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162.8 minimal polynomial 


Let K|L be alfinite field extension} Then if x € K, then the minimal polynomial m(x) € L[z] 


is the unique, non-zero polynomial) such that m(x) = 0 and any other polynomial 
f € Lx] with f(k) =0 is divisible by m. 


Given &, a polynomial m is the minimal polynomial of « if and only if m is monic, 
and m(x) = 0. 
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162.9 norm 


Let K/F be a [Galois extension| and let x € K. The norm NĶ(x) of x is defined to be 


the product of all the elements of the lorbit| of x under the of the 
Gal(K/F) on K; taken with multiplicities if K/F is a [finite extension. 


In the case where K/F is a finite extension, the norm of x can be defined to be the 
[determinant] of the {linear transformation! [7] : K — K given by [2](k) := vk, where K 
is regarded as a\vector space|over F. This definition does not require that K/F be Galois, 
or even that K be a field—for instance, it remains valid when K is a (division ring] (although 
F does have to be afield] in [order] for determinant to be defined). Of course, for finite Galois 
extensions K/F, this definition agrees with the previous one, and moreover the formula! 


N& (x) := I] a(x) 


o€Gal(K/F) 


holds. 


The norm of z is always an element of F’, since any element of Gal( A/F’) permutes the orbit 


of x and thus [fixes N¥ (x). 
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162.10 primitive element theorem 


If F is a field) of characteristic|0, and a and b are [algebraic] over F’, then there is an element 
cin F(a,b) such that F(a,b) = F (c). 
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162.11 splitting field 


Let f € F [a] be alpolynomial] over a\field| F. A splitting field for f is a field [extension] K of 
F such that 


1. f splits (factors| into a product of linear factors) in K [2], 


2. K is the smallest field with this property| (any sub-extension field of K which [satisfies] 
the first property is equal to K). 


Theorem: Any polynomial over any field has a splitting field, and any two such splitting 
fields are isomorphic, A splitting field is always a normal extension) of the ground field. 


770 


Version: 3 Owner: djao Author(s): djao 


162.12 the field extension R/Q is not finite 


Theorem 17. Let L/K be alfinite field extension, Then L/K is an\algebraic extension| 
Corollary 2. The leztension of fields|R/Q is not finite, 


| Proof of the Corollary] If the extension was finite, it would be an algebraic extension. 


However, the extension R/Q is not For example, m € R is transcendental 
over Q (see [pi). 
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162.13 trace 


Let K/F be a [Galois extension) and let x € K. The trace Tr(a) of x is defined to be 


the sum of all the elements of the of x under the of the 
Gal(K/F) on K; taken with multiplicities if K/F is a [finite extension, 


In the case where K/F is a finite extension, 


Tee (x) := a(x) 
o€Gal(K/F) 


The trace of x is always an element of F', since any element of Gal(K/F) permutes the orbit 
of x and thus [fixes] Tr# (x). 


The mame)” trace” derives from the fact that, when K/F is [finite] the trace of x is simply 


the trace of the T:K — K of over F defined by 
Te) := zv. 
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Chapter 163 


12F10 — Separable extensions, Galois 
theory 


163.1 Abelian extension 


Let K be a/Galois extension! of F. The extension! is said to be an abelian extension if the 
(Galois group) Gal(/F) is abelian 


Examples: Q(/2)/Q has Galois group Z/2Z so Q(V/2)/Q is an abelian extension. 


Let n be a primitive nth root! of unity. Then Q(¢,)/Q has Galois group (Z/nZ)* (the group) 
of {units| of Z/nZ) so Q(¢,)/Q is abelian. 
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163.2 Fundamental Theorem of Galois Theory 


Let L/F be affinite dimensional|Galois extension|of fields, with Galois group|G := Gal(L/F). 
There is a inclusion-reversing correspondence between of G andlextensions| 
of F contained in L, given by 


e K — Gal(L/K), for any field K with FC KC L. 
e H — L” (the[fixed field of H in L), for any subgroup H C G. 


The extension L” [F isInormallif and only if H is anormal subgroup of G, and in this case the 
homomorphism|G — Gal(L"/F’) given by ø > o|; induces (via the first isomorphism theorem) 
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a {natural] identification Gal(L“/F) = G/H between the Galois group of L“/F and the 
{quotient group G/H. 
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163.3 Galois closure 


Let K be an [extension field! of F. A Galois closure of K/F is a field L D K that is a 
of F and is minimal in that respect, i.e. no proper subfield of L containing 
K is [normall over F. 
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163.4 Galois conjugate 


Let K be afield, and let L be a For any « € K, the Galois conju- 
gates of x are the elements of L which are in the lorbitl of x under the of the 


absolute Galois group Gx on L. 
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163.5 Galois extension 


A is Galois if it is and 
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163.6 Galois group 

The Galois group Gal(K/F’) of a/field/extension| K/F is thegroup) of all field automorphisms) 
o: K — K of K which|fix] F (i.e., o(z) = 2 for all z € F). 

The group operation is given by composition: for two automorphisms 01,02 € Gal(K/F), 
given by cı : K — K and oz : K — K, the product o1 - o2 € Gal(K/F) is the |composite| 
of the two [maps] o1 0 o2 : K — K. 
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163.7 absolute Galois group 


Let k be a field, The absolute Galois group Gg of k is the Gal(k*P/k) of the 
field [extension] k*°?/k, where k*°? is the separable closure) of k. 
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163.8 cyclic extension 
A K/F is said to be a cyclic extension if the Gal(K/F) is 
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163.9 example of nonperfect field 


Let F = F, (t), where F, is the with p elements. The splitting field Æ of the i 
polynomial] f = x — t is not|s e over F. Indeed, if a is an element of E such that 


a? = t, we have 


x? —t=2?—a?=(r-a)?, 


which shows that f has one [root] of multiplicity p. 
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163.10 fixed field 


Let K/F be a lfield|lextension| with Galois group G = Gal(K/F), and let H be a|subgroup| 
of G. The fired field of H in K is the set 


H = {x € K | o(x) = z for allo € H}. 
The set K is always a field, and F C KË C K. 
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163.11 infinite Galois theory 


Let L/F be a [Galois extension) not necessarily [finite dimensional] 
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163.11.1 Topology on the Galois group 


Recall that the|Galois group|G := T'((|Q)L/F) of L/F is thewroup)of all|field|automorphisms 
g : L — L that restrict to the identity mapon F, under the group operation of composition. 
In the case where the [extension] L/F is the group G comes equipped 


with a topology, which plays a key role in the statement of the|Galois correspondence] 


We define a|subset!U of G to be jopenjif, for each ø € U, there exists an intermediate field 
K C L such that 


e The degree [K : F] is finite! 


e Ifo’ is another element of G, and the restrictions o|x and o’|« are equal, then o’ € U. 


The resulting collection of open sets forms a topology on G, called the Krull topology, and 


G is a topological group under the Krull topology. 


163.11.2 Inverse limit structure 


In this|section|we exhibit the group G as a projective limit|of an inverse system of finite groups 
This construction shows that the Galois group G is actually a profinite group 


Let A denote the set of finite normal extensions! K of F which are contained in L. The set 
A is a partially ordered set under the Form the linverse limit! 


r := lim ((\Q)K/F) c [] T(iQ)K/F) 


KEA 


consisting, as usual, of the set of all (ox) € J[kr(((Q)K/F) such that ox|x = ox 
for all K,K' € A with K c K’. We make I into a topological space by putting the 
discrete topology on each finite set T(((Q)K/F) and giving T the subspace topology induced 


by the product topology|on J], I'((|Q)K/F). The group T is a\closed|subset of the compact] 
group Į [x r(((Q)K/F), and is therefore compact. 


Let 
6:G — |] P(IQK/F) 
KEA 
be the\group homomorphism | which sends an element ø € G to the element (ox) of J] x r(((Q)K/F) 
whose K-th coordinatelis the automorphism olk € T(((Q)K/F). Then the function! @ has 
image] equal to I and in fact is alhomeomorphism] between G and T. Since I is profinite, it 


follows that G is profinite as well. 
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163.11.3 The Galois correspondence 


Theorem 14 (Galois correspondence for extensions). Let G, L, F be as 
before. For every closed|subgroup H of G, let LË denote the|fized field of H. The correspon- 


dence 
K > T((\Q)L/K), 


defined for all intermediate field extensions F C K C L, is an inclusion reversing [bijection] 
between the set of all intermediate extensions K and the set of all closed subgroups of G. Its 
is the correspondence 

Hw LF, 


defined for all closed subgroups H of G. The extension K/F is normal if and only if 
r(((Q)L/K) is a\normal subgroup of G, and in this case the restriction map 


G — P((IQ)4/F) 
has |kernel T(((Q)L/K). 


Theorem 15 (Galois correspondence for finite subextensions). Let G, L, F be as 
before. 


e Every open subgroup H C G is closed and has finite linden in G. 
e If H CG is an open subgroup, then the field extension L? /F is finite. 


e For every intermediate field K with |K : F] finite, the Galois group T(((Q)L/K) is an 
open subgroup of G. 
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163.12 normal closure 


Let K be an [extension field) of F. A normal closure of K/F is a (field L D K that is a 
[normal extension of F and is minimal in that respect, i.e. no proper |subfield|of L containing 


K is normal over F. If K is an algebraic extension of F, then a normal closure for K/F 
exists and is unique up to isomorphism, 
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163.13 normal extension 


A [fieldlextension] K/F is normal if every irreducible!polynomial| f € F[2] which has at least 
onelrootlin K splits (factors! into a product of linear factors) in K [z]. 
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An extension K/F of is normal if and only if there exists a polynomial p € F'{z| 
such that K is the splitting field for p over F. 
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163.14 perfect field 


A perfect field is a field] K such that any algebraic extension field L/K is over K. 
All fields of \characteristic| 0 are perfect, so in particular the fields R, C and Q are perfect. 
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163.15 radical extension 


A radical tower is a\field extension] L/F which has a [filtration] 
PSC ee SL 


where for each i, 0 <i < n, there exists an element a; € L;,, and alnatural numberln; such 
that Liat = Lila) and a," E€ Li. 


A radical extension is a field extension K/F for which there exists a radical tower L/F with 
L > K. The notion of radical extension coincides with the informal concept of solving for 
the [roots] of a [polynomial] by [radicals] in the sense that a polynomial over K is bolvable! by 
radicals if and only if its splitting field) is a radical extension of F. 
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163.16 separable 


An f € F|x] with coefficients in afield] F is separable if f factors| into 
distinct linear factors over a splitting field) K of f. 


A polynomial g with coefficients in F is separable if each irreducible factor of g in F'[a] is a 
separable polynomial. 


A field K/F is separable if, for each a € K, the [minimal polynomial] of a over 
F is separable. When F has zero, every extension is separable; examples 


of inseparable extensions include the K(u){t}/(t? — u) over the field K(u) of 
[rational functions in one variable, where K has characteristic p > 0. 


TTT 
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163.17 separable closure 


Let K be a field) and let L be an lalgebraic closure) of K. The separable closure of K inside L 
is the [compositum] of all of K contained in L (that is to say, the 
smallest |subfield| of L that |contains| every finite separable extension of K). 
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Chapter 164 


12F20 — Transcendental extensions 


164.1 transcendence degree 


The transcendence degree of a set S over a K, denoted Ts, is the|sizel of the maximal 
lsubset!.S’ of S such that all the elements of S’ are|algebraically independent 


The transcendence degree of a field [extension] L over K is the transcendence degree of 
the minimal subset of L needed to generate L over K. 


Heuristically speaking, the transcendece of a set S is obtained by taking the 
number of elements in the set, subtracting the number of elements in that set, and 
then subtracting the number of algebraic relations| between distinct pairs of elements in S. 

Example 4 (Computing the Transcendence Degree). The set S = {V7, n, n?, e} has Ts < 2 
since there are four elements, 7 is algebraic, and the [polynomial] f(x,y) = xz? — y gives an 
[algebraic dependence] between 7 and 7? (i.e. (m, 7°) is alrootlof f), giving Ts < 4— 1 — 1 = 


2. If we assume the conjecture that e and m are algebraically independent, then no more 
dependencies can exist, and we can conclude that, in fact, Tg = 2. 
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Chapter 165 


12F99 — Miscellaneous 


165.1 composite field 
Let {Ka}, a € J, be a\collection|of|subfields| of a L. The composite field of the collection 
is the smallest subfield of L that contains) all the fields Ka. 


The notation Kı Kə (resp., Ki K2...K,) is often used to denote the composite field of two 
(resp., finitely many) fields. 
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165.2 extension field 


We say that alfield) K is an extension of F if F is alsubfield| of K. 
We usually denote K being an extension of F by: F C K, F < K, K/F, or 


K 


F 


If K is an extension of F, we can regard K as a over F. The dimension] of 
this space A could possibly be linfinite) is denoted [K : F], and called the degree of the 
extension 

1 The/term|” degree” reflects the fact that, in the more general setting of [Dedekind domains and scheme- 


theoretic the degree of an extension of fields equals the algebraic degree of the 
polynomial defining the projection map of the underlying curves. 
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One of the classic theorems on extensions |states| that if F C K C L, then 
|L: F] = |L: K]||K : F] 


(in other words, degrees are multiplicative in towers). 


Version: 4 Owner: drini Author(s): drini, djao 


781 


Chapter 166 


12J15 — Ordered fields 


166.1 ordered field 


An ordered field is an which is a 
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Chapter 167 


13-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


167.1 absolute value 


Let R be an ordered ring) and let a € R. The absolute value of a is defined to be the [function] 


| |: R— R given by 
a if a >0, 
|a| = 


—a otherwise. 


In particular, the usual absolute value | | on the R of real numbers! is defined in this 


Manner. 


Absolute value has a different meaning in the case of for a complex 
number z € C, the absolute value |z| of z is defined to be q/x? + y?, where z = x + yi and 
x,y € R are real. 


All absolute value functions the defining of alyaluation) including: 


e |a| > 0 for all a € R, with equality if and only if a = 0 
e |ab| = |a|- |b] for all a,b € R 
e |a+b| < |a| + |b| for all a,b € R (triangle inequality) 
However, in general they are not literally valuations, because valuations are required to be 


real valued. In the case of R and C, the absolute value is a valuation, and it induces a{metric) 
in the usual way, with distance] function defined by d(x,y) := |x — yl. 
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167.2 associates 


Let a,b be elements of airing) such that a = bu where u is a [unit] Then we say that a and b 


are associates. 


The ’associate’ property| induces an equivalence relation on the ring. 


On an integral domain} a and b are associates if and only if (a) = (b) where (a) denotes the 
principal ideal generated by zx. 
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167.3 cancellation ring 


A Iing] R is a cancellation ring if for all a,b € R, if a - b = 0 then either a = 0 or b = 0. 
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167.4 comaximal 


Let R be airing) Twolideals| 7, J of R are comaximal if I + J = R (i.e. if 1 = a +b for some 
acl, beJ). 


Two distinct are comaximal. 
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167.5 every prime ideal is radical 


Let R be a and let $ be a [prime ideal] of R. 
Proposition 8. Every prime ideal P of R is alradical ideal, i.e. 


P = Rad(P) 


R ecall that P C R is a prime ideal if and only if for any a,b E€ R 
a-bePoacPorbeP 
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Also, recall that 
Rad(P) = {r € R | In € N such that r” € P} 


Obviously, we have P C Rad(B8) (just take n = 1), so it remains to show the reverse 


Suppose r € Rad(%$), so there exists some n € N such that r” € P. We want to prove that 
r must be an element of the prime ideal P. For this, we use [induction] on n to prove the 
following 


For all n € N, for all r € R, rr eP >r eğ. 
Case n = 1: This is clear, r € P >r e %. 


Case n = Case n + 1: Suppose we have proved the proposition for the case n, so our 
induction hypothesis is 
vreER, r°eBoreY 


and suppose r”*+ € H. Then 
r-r =r e P 


and since $B is a prime ideal we have 
rEeorr” ece% 
Thus we conclude, either directly or using the induction hypothesis, that r € $B as desired. 
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167.6 module 


Let R be afring|with A left module M over R is a set with two 
+:M x M — M and -: Rx M — M, such that 


1. (a+b)+c=a+ (b+c) for all a,b,c € M 

2. a+b=b+a for all a,b E€ M 

3. There exists an element 0 € M such that a + 0 = a for all a € M 
4. For any a € M, there exists an element b € M such that a +b = 0 
5. r1- (r2: m) = (r1 - r2): m for all r1, r2 € R and m € M 

6. 1-m =m for all m € M 


7. r-(m+n)=(r-m)+(r-n) for allr € R and m,n € M 
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8. (ry +r2)- m = (r1: m) + (re-m) for all r1, r3 E€ Rand m € M 


A right module is defined analogously, except that the|function|- goes from M x R to M. If R 
is [commutative] there is an equivalences of category between the[categorylof left R-modules 


and the category of right R-modules. 
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167.7 radical of an ideal 


Let R be a\commutative ring) For any lideall J of R, the radical of I, written Rad(J), is the 
set 


{a € R: a” € Ifor some integer n > 0} 


The radical of an ideal J is always an ideal of R. 


If I = Rad(J), then J is called a radical ideal. 


Every is a radical ideal. If J is a radical ideal, the [quotient ring) R/T is a 
with no nonzero nilpotent elements 
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167.8 ring 


A ring is a set R together with two denoted + : R x R — R and 
-: Rx R —> R, such that 
1. (a+b) +c=a+ (b+c) and (a-b)-c= a. (b-c) for all a,b,c € R (associativel law) 
2. a+b= b +a for all a,b € R (commutativel law) 
3. There exists an element 0 € R such that a + 0 = a for all a € R 
4. For all a € R, there exists b € R such that a + b = 0 (additive linverse)) 
5. a- (b+c) = (a-b)+ (a-c) and (a+b)-c = (a-c)+ (b-c) for all a,b,c € R (distributive law) 


Equivalently, a ring is an (R,+) together with a second binary operation - 
such that - is associative and distributes over +. 
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We say R has a multiplicative identity if there exists an element 1 € R such that a-1 = 1-a = a 
for alla € R. We say R is commutative if a - b = b - a for all a,b E€ R. 


Every element a in a ring has a unique additive inverse, denoted —a. The subtraction 


[operator]in a ring is defined by the equation a — b := a + (—b). 


Version: 6 Owner: djao Author(s): djao 


167.9 subring 


Let (A, +, *) airing) A subring is alsubset] S of A with the operations + and » of Alrestricted| 
to S and such that S' is a ring by itself. 


Since the restricted operation inherits the associativity, commutativity of +, etc, usually 
only closure has to be checked. 


A subring is called an [deall if whenever s € S and a € A, it happens that sa € S. On 
ring theory, ideals are far more important than subrings (since they play the analogue to 


normal subgroups) for groups). 


Example: 


Consider the ring (Z,+,-). Then (2Z,+,-) is a subring since the sum or product of two 
leven numbers] is again an even number. 
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167.10 tensor product 


Summary. The tensor product is a formal multiplication of two or 
In essence, it permits us to replace bilinear maps from two such objects 


by an from the tensor product of the two objects. The origin of this 
operation lies in classic differential geometry and physics, which had need of multiply indexed 


geometric objects such as the first and second fundamental forms, and the stress = 


see Tensor Product (Classical 


Definition (Standard). Let R be a[commutative ring) and let A, B be R-modules. There 
exists an R-module A & B, called the tensor product of A and B over R, together with a 


canonical bilinear homomorphism] 
&:AxB—AQB, 


distinguished, up to by the following Every bilinear R- 
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module homomorphism 


@:AxBoC, 
lifts) to a unique R-module homomorphism 


6:A@B-C, 


such that 


o(a, b) = o(a 8 b) 
for alla € A, b € B. Diagramatically: 


Ax B-°>A@B 


| z 
$ jal¢ 
y 


C 


The tensor product A Q B can be constructed by taking the [free] R-module generated by all 
formal symbols 
a@b, acA, bEB, 


and quotienting by the obvious bilinear {relations} 


(a, + ag) Qb =a @b+a.@b, a,,a9€ A, bE B 
a Q (bı + bg) =a bı +a 8 by, a € A, bı,b2 E€ B 
r(a 8b) = (ra) 8b =a 8 (rb) a€A,beB,reR 


Definition (Categorical). Using the of |categories\ all of the above can be ex- 
pressed quite simply by stating that for all R-modules M, the functor|(—) @ M is left-adjoint 
to the functor Hom(M, —). 
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Chapter 168 


13-X X — Commutative rings and 
algebras 


168.1 commutative ring 


Let (X,+,*) be a fring} Since (X,+) is required to be an |abelian group, the operation + 
necessarily is commutative! This needs not to happen for x. Rings where * is commutative, 
that is, x x y = y x for all x,y € R, are called commutative rings. 
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Chapter 169 


13A02 — Graded rings 


169.1 graded ring 
Let G be an A G-graded [ring] R is a direct sum R = yeg Ry indexed with 


the |property that Rọ Rn C Rgn- 
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Chapter 170 


13A05 — Divisibility 


170.1 Eisenstein criterion 


theorem: 


Let f be a primitive polynomial over a unique factorization domain R, say 


f(z) = ao + at + a£? +... + ant”. 


If R has an lirreduciblel element p such that 
p | am l<m<n 


p’ { an 
pt ao 
then f is irreducible. 
proof: 


Suppose 
{= (bo +... + b52°)(cg +... + Gx") 


where s > 0 and t > 0. Since ao = boco, we know that p|{divides| one but not both of bọ and 
Co; suppose p | co. By hypothesis, not all the €m are divisible by p; let k be the smallest 
index such that p { cp. We have ap = bock + b1Ck—-1 +... + bkco. We also have p | k, and p 
divides every summand except one on the right side, which yields a contradiction. [QED] 
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Chapter 171 


13A10 — Radical theory 


171.1 Hilbert’s Nullstellensatz 


Let K be anlalgebraically closed field, and let 7 be anlideallin K[x1,..., £n], the 


in n indeterminates. 


Define V(I), the zero set of I, by 
V(I) = {(a1,..., an) E K” | f(a, ..., an) = Ofor allf € I} 


Weak Nullstellensatz: 
If V(I) = @, then I = K[x1,...,2,]. In other words, the zero set of any [proper ideal) of 


K[z1,...,£%n] is nonempty. 


Hilbert’s (Strong) Nullstellensatz: 
Suppose f € K[z1,...,%p] [satisfies] f(a1,...,a@n) = 0 for every (a1,..., an) € V(I). Then 
f" € J for some jinteger|r > 0. 


In the language of the latter result is [equivalent] to the statement that 
Rad(I) = I(V(J)). 
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171.2 = nilradical 


Let R be a An element x € R is said to be nilpotent if x” = 0 for 


some positive linteger|/n. The set of all of R is an ideal] of R, called the 
nilradical of R and denoted Nil(R). The nilradical is so named because it is the radical] of 


the zero idea 
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The nilradical of R equals the prime radical of R, although proving that the two are|equivalent) 
requires the 
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171.3 radical of an integer 


Given a {natural number) n, let n = pi --- pp” be its unique factorization as a product of 
Define the {radical] of n, denoted rad(n), to be the product p;--- pz. This is 
the square-free part of the |integer| and thus the radical of a|square-free number|is itself. 
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Chapter 172 


13A15 — Ideals; multiplicative ideal 
theory 


172.1 contracted ideal 


Let f : A— B bearing homomorphism), Let b be an lideallin B. Then it is easy to show 
that the inverse image of b, that is f~'(b), is an ideal in A, and we call it a contracted ideal. 


A common notation for the contracted ideal in this case is 6°. 
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172.2 existence of maximal ideals 


Let R #0 be a with Is there a maximal ideallin R? This Simple 
property turns out to be dependent on the axiom of choice) Assuming [Zorn’s lemma, which 
is to the axiom of choice, we are able to prove the following: 


Proposition 9. Every|ring R (as above) has a maximal ideal. 
L et È be the partially ordered set 
= {A| Ais an ideal of R, AFR} 
ordered by inclusion, 
Since 0 € R, the lideall generated by 0, (0) € £, because (0) 4 R. Hence X is non-empty. 


In [border] to apply Zorn’s lemma we need to prove that every chain|in © has an [upper bound) 
that belongs to X. Let {Aa} be a chain of ideals in X, so for all indices a, 8 we have 


Aa C Ag or Ag 2 Ag 
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We claim that B, defined by 
B=| JA. 


is such an upper bound. 


e B is an ideal. Indeed, let a,b € B, so there exist a, 3 such that a € Ag, b E Ag. Since 


these two ideals are in a chain we have 
Aa C Ag or Ag S Aa 


Without loss of generality, we assume A, C Ag. Then both a,b € Ag, and Ag is an 
ideal of the ring R. Thus a+b € Ag C B. 


Similarly, let r € R and b € B. As above, there exists 3 such that b € Ag. Since Ag is 
an ideal we have 
r-bEeAgCB 


Therefore, B is an ideal. 
e BAR, otherwise 1 would belong to B, so there would be an a such that 1 € Aa so 


A, = R. But this is impossible because we assumed A, € È for all indices a. 


Therefore B € X. Hence every chain in © has an upper bound in © and we can apply Zorn’s 
lemma to deduce the existence of M, a (with respect to inclusion) in X. 
By definition of the set X, M must be a maximal ideal in R. 


NOTE: Assuming that the axiom of choice does NOT hold, mathematicians have shown 
the existence of commutative rings (with 1) that have no maximal ideals. 
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172.3 extended ideal 


Let f : A — B be alringmap; We can look at the ideal] generated by the image] of a, which 


is called an extended ideal and is denoted by aê. 


It is not true in general that if a is an ideal in A, the image of a under f will be an ideal in 
B. (For example, consider the embedding) f : Z — Q. The image of the ideal (2) C Z is not 
an ideal in Q, since the only ideals in Q are {0} and all of Q.) 
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172.4 fractional ideal 


172.4.1 Basics 


Let A be an with K. Then K is an A-module, and we 
define a fractional ideal of A to be a submodule of K which is finitely generated as an 


A-module. 


The product of two fractional ideals a and b of A is defined to be the submodule of K 
generated by all the products x -y € K, for x € a and y € b. This product is denoted 
a- b, and it is always a fractional ideal of A as well. Note that, if A itself is considered as 
a fractional ideal of A, then a- A = a. Accordingly, the set of fractional ideals is always a 


[monoid] under this product operation, with identity element] A. 


We say that a fractional ideal a is invertible if there exists a fractional ideal a’ such that 
a-a’ = A. It can be shown that if a is invertible, then its inverse] must be a’ = (A : a), the 
of a in A. 


172.4.2 Fractional ideals in Dedekind domains 


We now suppose that A is a/Dedekind domain| In this case, every nonzero fractional ideal 
is invertible, and consequently the nonzero fractional ideals in A form a group] under [ideal] 
multiplication, called the ideal group of A. 


The unique factorization of ideals theorem that every fractional ideal in A 
uniquely into a finite) product of [prime ideals) of A and their (fractional ideal) inverses. It 


follows that the ideal group of A is freely generated as an abelian group by the nonzero prime 
ideals of A. 


A fractional ideal of A is said to be principal if it is generated as an A-module by a single 
element. The set of nonzero principal fractional ideals is a [subgroup] of the ideal group of A, 
and theļ|quotient groupļof the ideal group of A by the subgroup of principal fractional ideals 
is nothing other than the [ideal class group) of A. 
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1 In general, for any fractional ideals a and b, the annihilator of b in a is the fractional ideal (a : b) 
consisting of all x € K such that x -b Ca. 
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172.5 homogeneous ideal 


An generated by homogenous] elements is said to be [homogeneous] The most 
example is the K[|z1, £2,..., £n], where K is a |field, which is said to be 


homogeneous if it is generated by polynomials, each of which is homogeneous. 
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172.6 ideal 


Let R be alring) A left ideal (resp., right ideal) I of R is a nonempty |subset|J C R such that: 


ea+bel foralla,bel 


e r-a €I (resp. a-r € TI) foralaclandreR 


A 2-sided ideal is a left ideal J which is also a right ideal. If R is alcommutative ring, then 
these three notions of ideal are 
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172.7 maximal ideal 


Let R be a [ring] with identity, A proper left (right, two-sided) lideall m Ç R is said to be 
Imaximal if m is not a of any other proper left (right, two-sided) ideal of R. 


One can prove: 


e A left ideal m is maximal if and only if R/m is a [simple] left R-module. 
e A right ideal m is maximal if and only if R/m is a simple right R-module. 


e A two-sided ideal m is maximal if and only if R/m is aļsimple ring} 


All maximal ideals are prime ideals} If R is\commutative| an ideal m C R is maximal if and 
only if the quotient ring) R/m is a [field] 


Version: 3 Owner: djao Author(s): djao 


797 


172.8 principal ideal 


Let R be a [ring] and let a € R. The principal] left (resp. right, 2-sided) ideal] of a is the 
smallest left (resp. right, 2-sided) ideal of R containing the element a. 


When R is a\commutative ring, the principal ideal of a is denoted (a). 


Version: 2 Owner: djao Author(s): djao 


172.9 the set of prime ideals of a commutative ring 
with identity 


the set of prime ideals of a commutative ring with identity 


notation 


1: Spectrum of ing 
2: Spec(R) 


Note: This is a “seed” entry written using a short-hand format described in this FAQ 
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Chapter 173 


13A50 — Actions of groups on 
commutative rings; invariant theory 


173.1 Schwarz (1975) theorem 


theorem: 


Let T be a {compact||Lie group} a on V. Let wy,...,us be a Hilbert basis! 
for the T- invariant polynomials] P( T) (see Hilbert- Weyl theorem a Let ei RA (P). 


Then there exists eae h € E, (the|ring of C® germs R° —> R) such 
that f(x) = h(wi(z),...,us(x)). (GVO 


proof: 
The proof is shown on page 58 of [GVL]. 


theorem: (as stated by Gerald W. Schwarz) 


Let G be a compact Lie group acting orthogonally on R”, let p1,..., pkg be 
of P(R”)E (the set G-invariant polynomials on R”), and let p = 
(Pi; ---, Pk): R” — R*. Then p * E(R*) = E(R”)C. SGI 


proof: 


The proof is shown in the following publication [SG]. 
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173.2 invariant polynomial 


An invariant polynomial is a|polynomial] P that is|invariant) under a (compact) 
T acting on a\vector space| V. Therefore P is T-invariant polynomial if P(yx) = P(x) for all 


yeTandzeV. 
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Chapter 174 


13A99 — Miscellaneous 


174.1 Lagrange’s identity 
Let R be a\commutative ring, and let aj,...,@n,61,...,6, be arbitrary elements in R. Then 


($a) = (Sea) (Ex) - FE oman 


k=1 k=1 1<k<i<n 


Proof: 


The R where we take qz; yili = 1,...,n) from is so we can apply the 
[binomial formula, We start out with 


n 2 n 
(>: oa = Do ry) +> 2LiYjL 5 Yi (174.1.1) 
i=l i=1 


i,j=liFj 


Using the binomial theorem) we see that 
(ea — am) = apy? — 2zi jYiYj + yio: 


So we get 


(>: sa) + Se (iyi — tjy = ` (£7y7) + >. (ary; + miy: K174:1.2) 
i=1 


ij=1,iżj i=1 ae 


5 (>: z) B s) (174.1.3) 
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Note that changing the roles of i and j in x,y; — x;y;, we get 
TjYi — TiYj = —(Liyj — VjYi)- 
But this doesn’t matter when we Square, So we can rewrite the last equation to 
n 2 m n 
a oa) 2 ` (tiy; — 2jYyi)? = (>: =) (>: s) : (174.1.4) 
i=1 1<i<j<n i=1 i=1 


This is [equivalent] to the stated 
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174.2 characteristic 


The concept of characteristic that exists for|integral domains|can be generalized for|cyclic rings} 


By extending the existing definition in this manner, though, characteristics would no longer 


have to be 0 or [prime] 


The characteristic of an linfinitel cyclic ring is 0. Let R be an infinite cyclic ring and r be 


a [generator] of the [additivelgroup] of R. If z € Z such that zr = Op, then z = 0. Since no 
positive |integer|c exists such that cr = Op, it follows that R has characteristic 0. 


A is if and only if its and characteristic are equal. If R is a cyclic 
ring and r is a generator of the additive group of R, then |r| = |R|. Since, for every s € R, 
|s| |R], then it follows that char R = |R|. Conversely, if R is a finite ring such that 
char R = |R], then the\exponent of the additive group of R is also equal to |R|. Thus, there 
exists £ € R such that |t| = |R|. Since (t) is a [subgroup] of the additive group of R and 
|(t)| = |t] = |R], it follows that R is a cyclic ring. 
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174.3 cyclic ring 


A {ring is a cyclic ring if its additive eroup) is 


Every cyclic ring is under multiplication. For if R is a cyclic ring, r is a 
igenerator| of the additive group of R, and s,t € Z, then there exist a,b € Z such that s = ar 
and t = br. As a result, st = (ar)(br) = (ab)r? = (ba)r? = (br)(ar) = ts. (Note the disguised 
use of the (distributive |property] ) 


A result of the fundamental theorem of is that every ring with|square-free| 
is a cyclic ring. 
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If n is a positive integer| then, up to/isomorphism| there are exactly T(n) cyclic rings of order 
n, where 7 refers to the [tau function) Also, if a cyclic ring has order n, then it has exactly 


T(n) This result mainly follows from [Lagrange’s theorem] and its converse. Note 


that the converse of Lagrange’s theorem does not hold in general, but it does hold for finite 


Every subring of a cyclic ring is a cyclic ring. Moreover, every subring of a cyclic ring is an 


R is a finite cyclic ring of order n if and only if there exists a positive k of n such 
that R is isomorphic to kZąn. R is an infinite] cyclic ring that has no if and 
only if there exists a positive integer k such that R is isomorphic to kZ. Finally, R is an 
infinite cyclic ring that has zero divisors if and only if it is isomorphic to the following [subset] 
of M2x2(Z): 


Thus, any infinite cyclic ring that has zero divisors is a zero ring} 
Version: 15 Owner: Wkbj79 Author(s): Wkbj79 

174.4 proof of Euler four-square identity 
Using Lagrange’s identity, we have 


(>: oun) = oa xz) þe Yk) = > i<kcica(TkYi =P (174.4.1) 


We [group] the six Squares|into 3 groups of two squares and rewrite: 


(z1y2 — £291)? + (ways — £43)” (174.4.2) 
= ((@1y2 — £21) + (£3Y4 — T4y3))? — 2((r1y2 — Loy) (X3y4 — Tays)) 
(x1ys3 — £3y1)? + (t2y4 — x442)” (174.4.3) 
= ((x1y3 — £341) — (waya — taya))? + 2(a1y3 — T3y1)(T2y4 — Lays) 
(x1y4 — tayi)” + (T2y3 — L3Y2)? (174.4.4) 
= ((a1ys — Fay) + (Xay3 — @3Y2))?2(a1y4 — Lay1)(Loy3 — £32). (174.4.5) 
Using 
— 2((L1y2 — Loy) (@3ya — Tay3)) +2(L1y3 — ©3Y1)(Loy4 — Lay2) (174.4.6) 
—2(r1ys — Tayi) (X23 — L3Y2) = 0 
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we get 


> (rays — Bi) = ((tiy2 — £21) +(x3ya — 4ys))(174.4.7) 


1<k<i<4 
+((r1y3 — @3y1) — (toys — Tayo))? (174.4.8) 
+((t1y4a — Layr) + (£2y3 — L342)? 


by adding equations [744.2174.4.4] We put the result of equation [74.47] into [744.1] and 
get 


4 2 
(>: nu) (174.4.9) 
k=1 
4 4 
= (>: z) (>: s) —((£1Y2 — T241 + T3Y4 — Lays)? 
k=1 k=1 


—(£1Y3 — T3Y1 + Layo — L2y4) —(Liya — T4Y1 + T2Y3 — T3y2)” 


which is [equivalent] to the claimed 
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174.5 proof that every subring of a cyclic ring is a 
cyclic ring 


Following is a proof that every subring of a cyclic ring is a cyclic ring. 


Let R be a[cyclic ringļand S be a[subring]of R. Then the ladditivelļgroup]of S is a[subgroup] 
of the additive group of R. By definition of cyclic group, the additive group of R is [cyclic] 
Thus, the additive group of S is cyclic. It follows that S' is a cyclic ring. 
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174.6 proof that every subring of a cyclic ring is an 
ideal 


Following is a proof that every subring of a cyclic ring is an ideal. 


Let R be a\cyclic ring) and S be a subring| of R. Then R and S are both cyclic rings. Let r 
be a [generator] of the ladditivelgroup] of R and s be a generator of the additive group of S. 
Since s € S and S is a subring of R, then s € R. Thus, there exists z € Z with s = zr. 
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Lett € R and u € S. Since u € S and S is a subring of R, then u € R. Since multiplication 
is [commutative] in a cyclic ring, then tu = ut. Since t € R, then there exists a € Z with 
t =ar. Since u € S, then there exists b € Z with u = bs. 


Since R is afring}, then r? € R. Thus, there exists k € Z with r? = kr. Since tu = (ar) (bs) = 
(ar)[b(zr)] = (abz)r? = (abz)(kr) = (abkz)r = (abk)(zr) = (abk)s € S, it follows that S is 
an fideall of R. 
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174.7 zero ring 


A [ringlis a zero ring if the product of any two elements is the additivelidentity| (or zero). 


Zero rings are commutative] under multiplication. For if Z is a zero ring, 0z is its additive 
identity, and x,y € Z, then ry = 0z = yz. 


Every zero ring is a nilpotent ring, For if Z is a zero ring, then Z? = {0z}. 


Since every subring| of a ring must contain) its zero element, then every subring of a ring is 
an |ideal\ and a zero ring has no proper [prime ideals 


The simplest zero ring is Z4 = {0}. 


Zero rings exist in abundance. They can be constructed from any ring. If R is a ring, then 


(C ae) 

r -r 
considered as a subring of M2x2(R) (with standard [matrix additionland multiplication) is a 
zero ring. Moreover, the [cardinality] of this subset] of M2x2(R) is the same as that of R. 


Every [finite] zero ring can be written as a/direct product] of which must also be 
zero rings themselves. This is proven from the fundamental theorem of finite abelian groups} 
Thus, if p1,- . - , Pm are distinct {primes} a1, ..., @m are positivelintegers, and n = J)", p;", then 
the number of zero rings of order|n is []/", P(a;), where P denotes the [partition function} 
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Chapter 175 


13B02 — Extension theory 


175.1 algebraic 
Let B be a with a [subring] A. An element x € B is algebraic over A if there exist 
elements a1,...,@, E A, with a, 4 0, such that 
ine” op, ae bee tage +a = O. 
An element x € B is transcendental over A if it is not algebraic. 
The ring B is algebraic over A if every element of B is algebraic over A. 
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175.2 module-finite 


Let S be a [ring] with |subring) R. 

We say that S is module-finite over R if S is [finitely generated] as an R-module. 
We say that S is ring-finite over R if S = Riui,..., Un] for some v1,...,Un € S. 
Note that module-finite implies ring-finite, but the converse is false. 

If L is ring-finite over K, with L, K [fields] then L is a finite extension] of K. 
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Chapter 176 


13B05 — Galois theory 


176.1 algebraic 


Let K be an extension field of F and let a € K. 
If there is a nonzero f € Fla] such that f(a) = 0 (in K) we say that a is 


algebraic over F. 


For example, J2€Ris algebraic over Q since there is a nonzero polynomial with \rational| 
coefficients, namely f(x) = x? — 2, such that f(/2) = 0. 
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Chapter 177 


13B21 — Integral dependence 


177.1 integral 


Let B be alring| with a|subring) A. An element x € B is integral over A if there exist elements 
@1,...,@,_-1 E A such that 
og, yo +--+ oye tag = 0. 


The ring B is integral over A if every element of B is integral over A. 
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Chapter 178 


13B22 — Integral closure of rings and 
ideals ; integrally closed rings, related 
rings (Japanese, etc.) 


178.1 integral closure 


Let B be a{ring| with a|subring) A. The integral closure of A in B is the set A’ C B consisting 
of all elements of B which are over A. 


It is a theorem that the integral closure of A in B is itself a ring. In the special case where 
A = Z, the integral closure A’ of Z is often called the ring of integers in B. 
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Chapter 179 


13B30 — Quotients and localization 


179.1 fraction field 


Given an|integral domain R, the fraction field of R is the STIR of R with respect 
to the multiplicative set) .S = R \ {0}. It is always a [field] 
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179.2 localization 


Let R be a and let S be a nonempty multiplicative subset of R. The 
localization of R at S is the SİR whose elements are of Rx S 
under the equivalence relation (a, s) ~ (b, t) if r(at — bs) = 0 for some r € S. Addition and 


multiplication in S~'R are defined by: 


e (a,s) + (b,t) = (at + bs, st) 
e (a,s)-(b,t) =(a-b,s-t) 


The equivalence class of (a, s) in S~'R is usually denoted a/s. For a € R, the localization of 


R at the minimal multiplicative set containing a is written as R,. When S is the complement} 
of a|prime ideal] p in R, the localization of R at S is written Ry. 
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179.3 multiplicative set 


Let A be a set on which a multiplication operation -: A x A —> A has been defined. A 
multiplicative subset of A is a\subset|,S C A with the that s-t € S for every s,t E€ S. 
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Chapter 180 


13C10 — Projective and free modules 
and ideals 


180.1 example of free module 


Clearly from the definition, Z” is lfreel as a Z-module for any positive integer] n. 
A more interesting example is the following: 


Theorem 1. The set of\rational numbers Q do not form a\free| Z-module. 


F irst note that any two elements in Q are Z-linearly dependent. If « = = and y = m 2 


then gip2x — qəpıy = 0. Since [basis elements must be /linearly independent this shows fiat 
any |basis| must consist of only one element, say 2, with p and q relatively prime, and es 
loss of eer q > 0. The Z-span of E } is the set of rational pees of the form “. I 


claim that > =] is not in the set. If it were, hes we would have = = a for some n, but this 


implies that: = = TEET which has no solutions for n,p € Z ,q € Z+, giving us a contradiction. 
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Chapter 181 


13C12 — Torsion modules and ideals 


181.1 torsion element 
Let R be a principal ideal domain, We call an element m of the R-module a torsion element 
if there exists a non-zero a € R such that a -m = 0. The set is denoted by tor( M). 


tor(M) is not empty since 0 € tor(M). Let m,n € tor(M), so there exist a, 3 # 0 € R such 
that 0 =a-m=(-n. Since aG-(m—n) = B-a-m—a-3-n=0,af #0, this implies 
that m—n € M. So tor(M) are a\subgroup|of M. Clearly 7 -m € tor(M) for any non-zero 
rT € R. This shows that tor(M) is a submodule of M, the torsion submodule of M. 


Version: 2 Owner: Thomas Heye Author(s): Thomas Heye 


813 


Chapter 182 


13C15 — Dimension theory, depth, 
related rings (catenary, etc.) 


182.1 Krull’s principal ideal theorem 


Let R be a Noetherian ring, and P bea minimal over a principal ideal (x). Then 
the height) of P, that is, the/dimension|of Rp, is less than 1. More generally, if P is a minimal 


prime of an |ideal] generated by n elements, the height of P is less than n. 
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Chapter 183 


13C99 — Miscellaneous 


183.1 Artin-Rees theorem 


Let A bea a anlideal, E a [finitely generated|module, and F a submodule. 
Then there exists an [integer] s > 1 such that for all integers n > 1 we have 


dE F= AE \F). 
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183.2 Nakayama’s lemma 


Let R be a[commutative ring) with 1. Let M be alfinitely generated|R-module. If there exists 
an ldealla of R contained in the Jacobson radical and such that aM = M, then M = 0. 
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183.3 prime ideal 


Let R be afring| A two-sided |proper ideal] p of a ring R is called a prime ideal if the following 
equivalent) conditions are met: 


1. If J and J are left ideals) and the [product of ideals} 7J |satisfies| IJ C p, then J C p or 
JCP. 
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2. If I and J are|right ideals) with IJ C p, then I C p or J Cp. 
3. If I and J are two-sided lideals| with JJ C p, then I C p or J Cp. 


4. If x and y are elements of R with «Ry C p, then x € p ory Ep. 

5. R/p is a prime ring 
When R is|commutativel with lidentity, an ideal p of R is|prime if and only if for any a,b € R, 
if a-b€ p then either a € p or b E p. 


One also has in this case that an ideal p C R is prime if and only if the quotient ring R/p 
is an integral domain] 
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183.4 proof of Nakayama’s lemma 


Let X = {@1,%2,...,%,} be a minimal set of |generators| for M, in the sense that M is not 
generated by any of X. 


Elements of aM can be written as linear combinations! Y` a;x;, where a; € a. 
Suppose that |X| > 0. Since M = aM, we can express 2; as a such a linear combination: 


Li = ) Aiti. 


Moving the [term] involving a, to the left, we have 


(1 = ai )£ı = `. ALi. 


i>1 


But a, € J(R), so 1 — a, is invertible, say with inverse] b. Therefore, 


But this means that x, is redundant as a generator of M, and so M is generated by the 
{%,13,...,%,}. This contradicts the minimality of X. 


We conclude that |X| = 0 and therefore M = 0. 
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183.5 proof of Nakayama’s lemma 


(This proof was taken from [[.) 


If M were not zero, it would have a/simple|quotient, isomorphic to R/m for some|maximal ideal] 
m of R. Then we would have mM 4 M, so that aM # M asaCm. 


REFERENCES 


1. Serre, J.-P. Local Algebra. Springer-Verlag, 2000. 
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183.6 support 
The support Supp(M) of almodule] M over airing) R is the set of all prime ideals| p C R such 
that the [localization| M, is nonzero. 


The|mazimal support Supp,,(M) of a module M over a ring R is the set of all maximal ideals] 
m C R such that Mm is nonzero. 
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Chapter 184 


13E05 — Noetherian rings and 
modules 


184.1 Hilbert basis theorem 


Let R be a right (left) Then R|x] is also right (left) 
Version: 6 Owner: KimJ Author(s): KimJ 


184.2 Noetherian module 
A module! over R is said to be {noetherian| if the following [equivalent] conditions hold: 
1. Every submodule of R is over R 


2. The ascending chain condition holds on submodules 


3. Every nonempty family of submodules has a 
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184.3 proof of Hilbert basis theorem 


Let Rbea and let f(r) = anz” + dna" +... + ax + ao € R[x] with 
an #0. Then call a, the initial coefficient of f. 


Let I be anlideallin R[x]. We will show J is finitely generated, so that R[x] isnoetherian| Now 
let fo be a polynomial of least \degree] in I, and if fo, fi,..., fk have been chosen then choose 
fz from IN (fo, fi,.--, fe) of minimal degree, Continuing inductively gives a/sequence| (fy) 


of elements of J. 


Let a, be the initial coefficient of fp, and consider the ideal J = (a1, a2, a3,...) of initial 
coefficients. Since R is Noetherian, J = (ao, ..., an) for some N. 


Then I = (fo, fi,---; fn). For if not then fyyi € IN (fo, fi,---, fn), and an4i = Srey Unde 
for some u1, Ug,...,un E R. Let g(x) = ae Unfpx’® where vg = deg( fyi) — deg( fx). 


Then deg(fivi1 — g) < deg( fn), and fv41—g € I and fyi —g E (fo, fi,---, fw). But 
this contradicts minimality of deg(f4+1). 


Hence, R[x] is Noetherian. 
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184.4 finitely generated modules over a principal ideal 
domain 


Let R be a [principal ideal domain) and let M be a finitely generated) R- 


Lemma 1. Lemma Let M be a submodule of the R-module R”. Then M is|free and finitely 
generated by s < n elements. 


F or n = 1 this is clear since M is an lideallof R and is generated by some element a € R. 
Now suppose that the statement is true for all submodules of R™,1 <m<n-—1. 


For a submodule M of R” we define f : M +> R,(ki,...,kn) > kı. Since f is [surjective] 
the image] of f is an ideal J in R. If J = {0}, then M C ker(f) = (0) x R"-*. Otherwise, 
J = (g),g # 0. In the first case, elements of ker(f) can bijectively be mapped to R”! by 
g : ker( f) — R°, (0, ki, ..., kn-1) | (ki, ..., kn—1); so the image of M under this 
is a submodule of R”! which by the linductionl hypothesis is finitely generated and free. 


Now let x € M such that f(x) = gh and y € M with f(y) = g. Then f(x — hy) = 


f(x) — f(hy) = 0, which is to x — hy € ker(f) Q R” := N which is to 
a submodule of R"~!. This shows that Rr + N = M. 
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Let {g91,...,95} be a[basislof N. By assumption, s < n — 1. We’ll show that {z, g1,..., 9s} 
is linearly independent} So let ra + S>}_, rig; = 0. The first [component] of the g; are 0, so 
the first component of rz must also be 0. Since f(x) is a multiple of g # 0 and 0 =r - f(x), 
then r = 0. Since {g1,...,9s} are linearly independent, {x, 91,...,gs} is a generating set of 
M with s+1 < n elements. 


Corollary 3. IfM is a finitely generated R-module over a PID generated by s elements and 
N is a submodule of M, then N can be generated by s or fewer elements. 


L et {1,...,gs be a generating set of M and f : R œ M,(n,...,17s) => X; rigi Then 
the [inverse image] N "of N is a submodule of R° and according tollemmall] can be generated 
by s or fewer elements. Let nı,...,n, be a generating set of n'; then t < s, and since f is 
surjective, f (n1), ..., f(z) is a generating set of N. 


Theorem 1. Let M be a finitely generated module over a principal ideal domain R. 


(I) M/tor(M) is\torsion-free, i.e. tor(M) = {0}. In particular, if M is torsion-free, then 
M is free. 


(II) Let tor(M) be a proper submodule of M. Then there exists a finitely generated free 
submodule F of M such that M = F @tor(M). 


Proof of (I): For short set T := tor(M). For m € M m denotes the|coset|modulo T generated 
by m. Let m be altorsion element! of M/T, so there exists œ € R \ {0} such that a - m = 0, 
which means œ -m C T. But then a-m is a member of T, and this implies that M/T has 
no non-zero torsion elements (which is obvious if M = tor(M)). 


Now let N be a finitely generated torsion-free R-module with generating set {g1,..., 9n}- 
The [homomorphism] f : R” N,(ri,.--,Tr) > Xi rigi is[injective]since N is torsion-free. 
Let for instance {e1,..., en} be the standard basis of R”. Then the elements f(e1),..., f(€n) 
are linearly independent in N. Now let N = M/tor(M) where tor(M) = {0}. Then the 
cosets can be identified with the elements of M, and the statement follows. 


Proof of (II): Let 7: Mt M/T,a œ a +T. 7 is surjective, so m1,..., Mm, € M can be 
chosen such that m(m;) = n; where the n;’s are a basis of M/T. If 0m = D aimi, then 


0 = ya aini. Since n1, ..., n are linearly independent in N it follows 0 = a =... = a. 
So the submodule spanned by| mı, ..., mM of M is free. 
Now let m be some element of M and 7(m) = $~“; a;n;. This is equivalent to m — 


S aini) € ker(r) = T. Hence, any m is the sum of f+t,f € F,t € T. Since F is 
torsion-free, FAQT = {0}, and it follows M = F OT. 
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Chapter 185 


13F07 — Euclidean rings and 
generalizations 


185.1 Euclidean domain 


An Euclidean domain is an integral domain) D where an v has been 

defined. 

Any Euclidean domain is also a principal ideal domain and therefore also a unique factorization domain 
But even more important, on Euclidean domains we can define|gcd|and use 


Examples of Euclidean domains are the |rings|Z and the on one variable F[z] 
where F is a [field] 
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185.2 Euclidean valuation 


Let D be an integral domain, An Euclidean valuation is a [function] from non-zero elements 
to the non-negative 
v: D — {0} = Z+ |_J{0} 


such that 


e For any a,b € D, b £0, there exist q,r € D such that a = bq +r with v(r) < v(b) or 
r=0. 


e For any a,b € D both non-zero, v(a) < v(ab). 
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Euclidean valuations are important because they let us define and 
use Some facts about Euclidean valuations: 


e The value v(1) is minimal. That is, v(1) < v(a) for any nonzero element of D. 


e u € Dis anlunitlif an only if v(u) = v(1). 
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185.3 proof of Bezout’s Theorem 


Let D be an [integra] domain) with an Euclidean valuation)! Let a,b € D not both 0. Let 
(a,b) = {ax + by|z,y € D}. (a,b) is an lidealin D 4 {0}. We choose d € (a,b) such that 
u(d) is the smallest positive value. Then (a,b) is generated by d and has the dja 
and d|b. Two elements x and y in D are[associatelif and only if u(x) = u(y). So d is unique 


up to alunitlin D. Hence d is the [greatest common divisor] of a and b. 
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185.4 proof that an Euclidean domain is a PID 


Let D be an Euclidean domain| and let a C D be an lideall We show that a is principal] 
Indeed, let 
A={v(r):2€a,x #0} 


be the [subset! of Z which [contains] the valuations] of the non-zero elements of a. Since A is 
non-empty and [bounded] below, it has a minimum m, and soppose that d € a is an element 


such that v(d) = m. We contend that a = (d). (d) C a, let’s prove the 
Let x € a, there exist elements y,r such that 


x=yd+r 


with v(r) < v(d) or r = 0. Since r = x — yd € a, it must be r = 0, hence d|z, which 
concludes the proof. 
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Chapter 186 


13F10 — Principal ideal rings 


186.1 Smith normalform 


Let A # 0 be am x n-matrix with entries from a principal ideal domain) R. For a € R \ {0} 
6(a) denotes the number of {primel|factors of a. Start with t = 1 and choose j; to be the 


smallest of A with a non-zero entry. 


(I) If aj, = 0 and a,j, Z 0, exchange rows] 1 and k. 


(II) If there is an entry at position (k, j+) such that arj, fak ja then set 8 = ged (dt ji, akja) 
and choose o,T € R such that 


Oy ~T: akj = 1. 


By left-multiplication with an appropriate[matrixlit can be achieved that row 1 of the 


is the sum of row 1 multiplied by o and row k multiplied by (—7). 
Then we get 8 at position (t, j+), where 6(3) < êlar: j Repeating these steps one ends 


up with a matrix having an entry at position (t, j+) that [divides] all entries in column 
Je 


(III) Finally, adding appropriate multiples of row t, it can be achieved that all entries in 
column j, except for that at position (t, j+) are zero. This can be achieved by left- 
multiplication with an appropriate matrix. 


Applying the steps described above to the remaining non-zero columns of the resulting matrix 
(if any), we get an m x n-matrix with column indices j,,...,7, where r < min(m, n), each 


of which |satisfies] the following: 


1. the entry at position (l, jı) is non-zero; 
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2. all entries below and above position (l, jı) as well as entries left of (l, jı) are zero. 


Furthermore, all rows below the r-th row are zero. 


This is a version of the Gauss algorithm for principal ideal domains which is usually described 


only for 


Now we can re-order the columns of this matrix so that elements on positions (7,7) for 
1<i<r are nonzero and (aii < (aii i+1) for 1 < i < r; and all columns right of the r-th 
column (if present) are zero. For short set a; for the element at position (7,7). 6 has non- 
negative [integer] values; so 0(a,) = 0 is equivalent) to a, being alunit) of R. 5(a;) = 6(ai41) 
can either happen if a; and a;,, differ by a unit factor, or if they are relative prime. In the 
latter case one can add column i+ 1 to column 7 (which doesn’t change a;) and then apply 
appropriate row manipulations to get a; = 1. And for 6(a;) < d(aj41) and a; Jaj41 one can 
apply step (II) after adding column i+ 1 to column 7. This diminishes the minimum 6-values 
for non-zero entries of the matrix, and by reordering columns etc. we end up with a matrix 
whose diagonalelements a; satisfy ajla; V1 <i<r. 


Since all row and column manipulations involved in the process are invertible, this shows 
that there exist invertible m x m and n x n-matrices s, T so that S- A-T is 


Q1 0 0 
0 a 0 0 
(186.1.1) 
QO... 0 a, O 
0O wD 


This is the Smith normalform of the matrix. The elements a; are unique up to associat- 
edness and are called elementary divisors. 
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Chapter 187 


13F25 — Formal power series rings 


187.1 formal power series 


Formal power series allow one to employ much of the analytical machinery of power series) 
in settings which don’t have natural notions of convergence. They are also useful in order to 
compactly describe/sequences}and to find closed formulas for recursively described sequences; 
this is known as the method of generating functions and will be illustrated below. 


We start with a R. We want to define the |ring} of formal power series over 


R in the variable X, denoted by R|[X]]; each element of this ring can be written in a unique 
way as an|infinite/sum of the form Xon X”, where the coefficients a, are elements of R; 


any choice of coefficients an is allowed. R[|[X]] is actually a topological ring) so that these 
infinite sums are well-defined and [convergent] The addition and multiplication of such sums 
follows the usual laws of power series. 


Formal construction Start with the set R of all infinite sequences in R. Define addition 


of two such sequences by 
(an) + (On) = (an + bn) 


and multiplication by 


This turns RN into a commutative ring with multiplicative identity] (1,0,0,...). We identify 
the element a of R with the sequence (a,0,0,...) and define X := (0,1,0,0,...). Then every 


element of RN of the form (ao, a1, @2,...,anv,0,0,...) can be written as the [finite] sum 
N 


Sax”. 


n=0 
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In order to extend this equation to infinite we need a on RN. We define 
d((an), (bn)) = 2-*, where k is the smallest natural number|such that a; 4 bp (if there is not 
such k, then the two sequences are equal and we define their distance) to be zero). This is a 
metric which turns RN into a topological ring, and the equation 


(a) = 3 an X” 
n=0 


can now be rigorously proven using the notion of convergence arising from d; in fact, any 
rearrangement of the series converges to the same limit 


This topological ring is the ring of formal power series over R and is denoted by R|[X]]. 


Properties R[{[X]] isan over R which{|contains|the ring R[X] of polynomials) 


over R; the polynomials correspond to the sequences which end in zeros. 


The [geometric series] formula is valid in R[[X]]: 
-X= xX" 
n=0 


An element $ an X” of R|[X]] is invertible in R[[X]] if and only if its constant] coefficient ao 
is invertible in R. This implies that the Jacobson radical] of R[[X]] is the ideal] generated by 
X and the Jacobson radical of R. 


Several algebraic |properties| of R are inherited by R[[X]]: 


e if R is allocal ring} then so is R[[X]] 
e if R is\noetherian| then so is R[[X]] 
e if R is an [integral domain, then so is R[[X]] 
e if R is afield] then R[[X]] is a discrete valuation ring] 
The metric wpace|(R[LX]] d) is\complete,‘The(FopoTogylon R_[X] is equal to the/prodtuct topology 


on R where R is equipped with the discrete topology, It follows from 
that R[[X]] is\compact|if and only if R is finite. The topology on R[[X]] can also be seen as 


the I-adic topology, where J = (X) is the ideal generated by X (whose elements are precisely 
the formal power series with zero constant coefficient). 


If K = R is a field, we can consider the [quotient field| of the integral domain K[[X]]; it is 
denoted by K((X)). It is a whose elements are called formal Laurent 


series; they can be uniquely written in the form 


where M is an/|integer| which depends on the Laurent series] f. 
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Formal power series as functions In analysis, every convergent power series defines 


a function with values in the or complex numbers, Formal power series can also be 
interpreted as functions, but one has to be careful with the domain and codomain| If 


f = a,X" is an element of R[[X]], if S is alcommutativel associative algebra over R, if I 
an ideal in S such that the J-adic topology on S' is complete, and if x is an element of J, 
then we can define 


ia) Sar 
n=0 


This latter series is guaranteed to converge in S given the above assumptions. Furthermore, 
we have 


(f+ 9)(@) = fle) + g(x) 
and 
(fg) (x) = f (2)g(2x) 
(unlike in the case of bona fide functions, these formulas are not definitions but have to 
proved). 


Since the topology on R[[X]] is the (X)-adic topology and R|[X]] is complete, we can in 
particular apply power series to other power series, provided that the arguments don’t have 
constant coefficients: f(0), f(X? — X) and f((1 — X)~t — 1) are all well-defined for any 
formal power series f € R[[X]]. 


With this formalism, we can give an explicit formula for the of a power 


series f whose constant coefficient a = f(0) is invertible in R: 


7 — Soa" (a _ pe 
n=0 


Differentiating formal power series If f = X o @nX” € R|[X]], we define the formal 
derivative of f as 


Df= Sann x", 


n=1 


This operation is R-linear, obeys the product rule| 
Df +g) = (DJ gt (Dg) 


and the 
D(f(g)) = (DF)(g) Da 
(in case g(0)=0). 


In a sense, all formal power series are Taylor series, because if f = X` an X”, then 
7 n 3 


(D* f)(0) = k! ay 
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(here k! denotes the element 1 x (1+1) x (1+141)x...E R. 


One can also define differentiation for formal Laurent series in a natural way, and then the 


in addition to the rules listed above, will also be valid. 


Power series in several variables The fastest way to define the ring R[[.X1,...,X,|] 
of formal power series over R in r variables starts with the ring S = R|X4,..., X,] of 
polynomials over R. Let I be the ideal in S generated by Xj,...,X,, consider the J-adic 
topology on S, and form its|completion| This results in a complete topological ring containing 
S which is denoted by R[[Xj,...,X7]]- 


For n = (n1,...,7,) € N", we write X" = Xj" --- Xr. Then every element of R[[X1,...,X>]| 
can be written in a unique was as a sum 


` an X” 


neN” 


where the sum extends over all n € N”. These sums converge for any choice of the coefficients 
an € R and the order in which the summation is carried out does not matter. 


If J is the ideal in R{[X1,...,X,]] generated by X),...,X;, (ie. J consists of those power 
series with zero constant coefficient), then the topology on R[[X1,...,X;,]|] is the J-adic 
topology. 


Since R[[X,]] is a commutative ring, we can define its power series ring, say R[[X1]][[Xa]]. 
This ring is naturally isomorphic|to the ring R[[X1, X2]] just defined, but as topological rings 
the two are different. 


If K = R is a field, then K[[Xj,...,X;,]|] is a unique factorization domain 


Similar to the situation described above, we can “apply” power series in several variables to 


other power series with zero constant coefficients. It is also possible to define|partial derivatives 


for formal power series in a straightforward way. Partial derivatives commute, as they do 


for continuously differentiable functions. 


Uses One can use formal power series to prove several relations) familar from analysis in a 
purely algebraic setting. Consider for instance the following elements of Q|[X]]: 


Then one can easily show that 
sin? + cos? = 1 
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and 
D sin = cos 


as well as 
sin(X +Y) = sin(X) cos(Y) + cos(X) sin(Y) 
(the latter being valid in the ring Q|[X, Y]}). 
As an example of the method of generating functions, consider the problem of finding a closed 


formula for the [Fibonacci numbers] f, defined by fn42 = fn+1 + fns fo =0, and fı = 1. We 


work in the ring R{[X]] and define the power series 


n=0 


f is called the generating function for the sequence (fn). The generating function for 
the sequence (fn—1) is Xf while that for (f,—2) is X?f. From the [recurrence relation, we 
therefore see that the power series X f+ X?f agrees with f except for the first two coefficients. 
Taking these into account, we find that 


f=Xf+X’f+X 


(this is the crucial step; recurrence relations can almost always be translated into equations 
for the generating functions). Solving this equation for f, we get 


7 X 
> -xX 


Using the $ı = (1 + v5)/2 and ġ2 = (1 — V5)/2, we can write the latter 


expression as 


f 


a(r Rr) 
V5 \l-49ıX 1-¢ġXjJ 


These two power series are known explicitly because they are geometric series; comparing 
coefficients, we find the explicit formula 


o 1 


fn V5 


(T — $2). 


In algebra, the ring A’|[X),...,X,]] (where K is a field) is often used as the “standard, most 
general” complete local ring over K. 


Universal property The power series ring R||X1,...,X;,]] can be characterized by the 
following \universal] property: if S is a commutative associative algebra over R, if I is an ideal 
in S such that the -adic topology on S is complete, and if 7,,...,7, € J are given, then 
there exists a unique ®: R[[X),...,X;,]] — S with the following properties: 
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e @ is an R-algebra homomorphism 
e Ọ is continuous 


e OX) z; for i = 1,...,r. 
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Chapter 188 


13F30 — Valuation rings 


188.1 discrete valuation 


A discrete valuation on a K is alyaluation]|-|: K — R whose image’is a discrete [subset] 
of R. 


For any field K with a discrete valuation | - |, the set 
R:={xe hele) =< 1} 
is a|subring| of K with sole [maximal ideal) 
M := {x € K: |z| <1), 


and hence R is a Conversely, given any discrete valuation ring R, 
the field of fractions K of R admits a discrete valuation sending each element x € R to c”, 


where 0 < c < 1 is some arbitrary and n is the [order] of z, and extending 
multiplicatively to K. 


Note: Discrete valuations are often written additively instead of multiplicatively; under this 
alternate viewpoint, the element x Maps) to log, |x| (in the above notation) instead of just 
|x|. This [transformation] reverses the order of the (since c < 1), and sends 


the element 0 € K to oo. It has the advantage that every valuation can be normalized by a 


suitable [scalar] multiple to take values in the [integers] 
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188.2 discrete valuation ring 


A discrete valuation ring R is a principal ideal domain with exactly one maximal ideal M. 
Any |generator|t of M is called a uniformizer or uniformizing element of R; in other words, 
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a uniformizer of R is an element t € R such that t € M but t ¢ M?. 


Given a discrete valuation ring R and a uniformizer t € R, every element z € R can be 
written uniquely in the form u- t” for some junit! u € R and some nonnegative 
n € Z. The integer n is called the order of z, and its value is [independent] of the choice of 
uniformizing element t € R. 
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Chapter 189 


13G05 — Integral domains 


189.1 Dedekind-Hasse valuation 


If D is an then it is a/PID|iff it has a Dedekind-Hasse valuation, that 
is, alfunction v : D — {0} > Z* such that for any a,b € D — {0} either 


e da € (a)38 € (b) [0 < via + B) < V(b) 


Proof: First, let v be a Dedekind-Hasse valuation and let J be an ideal of an integral domain 
D. Take some b € I with v(b) minimal (this exists because the are 
and some a € I such that a 4 0. I must [contain] both (a) and (b), and since it is [closed] 
under addition, a + @ € I for any a € (a), B € (b). 


Since v(b) is minimal, the second possibility above is ruled out, so it follows that a € (b). 
But this holds for any a € I, so J = (b), and therefore every ideal is princple. 


For the converse, let D be a PID. Then define v(u) = 1 for any{unit} Any non-zero, non-unit 
can be factored into a [finite product of (since every PID as a\UFD), and every 


such factorization of a is of the same length, r. So for a € D, a non-zero non-unit, let 


v(a) =r + 1. (Obviously|r € Z+. 


Then take any a,b € D — {0} and suppose a ¢ (b). Then take the ideal of elements of 
the form {a + Gla € (a), 8 € (b)}. Since this is a PID, it is a {principal ideal] (c) for some 


r € D— {0}, and since 0 +b = b € (c), there is some non-unit x € D such that rc = b. 
Then N(b) = N(ar). But since x is not a unit, the factorization of b must be longer than 
the factorization of c, so v(b) > v(c), so v is a Dedekind-Hasse valuation. 
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189.2 PID 


A principal ideal domain D is an|integral domain] where every ideal] is a [principal idea 


In a PID, an ideal (p) is maximal] if and only if p is irreducible] (and prime) since any PID is 
also a\UFD). 
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189.3 UFD 


An integral domain] D such that 


e Every nonzero element of D that is not an [unit] can be factored into a product of a 


lfnite number of 
e If pypo---p, and qiqo---qs are two factorizations of a same element into irreducibles, 
then r = s and we can reorder the q; in a way that q; is an/associate|element of r; 
is called a unique factorization domain (UFD). 
Some of the classic results about UFDs: 
On a UFD, the concept of prime element) and irreducible element coincide. 
If F is afield, Fz] is UFD 


If D is a UFD, then D[z] (the of polynomials) on the variable x over D) is 
also a UFD 


Since Rix, y] = R|z][y], these results can be extended to rings of polynomials with a finite 
number of variables. 


If D is a [principal ideal domain, then it is also a UFD. 


The converse is however, non true. Let F a field and consider the UFD F[z,y] Let I the 
lideal| consisting of all the elements of F |x, y] whose|constant|term) is 0. Then it can be proved 


that I is not a [principal ideal] Therefore not every UFD is a PID. 
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189.4 a finite integral domain is a field 


A (commutative) integral domain|is a 


L et R be a finite integral domain. Let a be nonzero element of R. 
Define alfunctionly : R — R by y(r) = ar. 


Suppose y(r) = y(s) for some r,s € R. Then ar = as, which implies a(r — s) = 0. Since 


a #0 and R is a|cancellation ring} we have r — s = 0. So r = s, and hence is |injective| 
Since R is finite and y is injective, by the pigeonhole principle we see that y is also|surjective| 


Thus there exists some b € R such that y(b) = ab = 1p, and thus a is a unit 


Thus Risa finite division ring, Since it is commutative, it is also a field. 
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189.5 an artinian integral domain is a field 


Let R be an and lartinian] 
Let a € R with a # 0. Then RDaRDa@ RD... 


If a” R = a”™ R, then there exists r € R such that a” = a”t!r, therefore since a” Æ 0 (since 
R is an integral domain) then we must have 1 = ar. Hence a is a unit, 


Therefore, every artinian integral domain is a field| 
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189.6 example of PID 


Important examples of principal ideal domains 


© The [ring] of the [integers] Z. 
e The ring of|polynomials|in one variable over afield] i.e. a ring of the form F[X], where 


F is a field. Note that the ring of polynomials in more than one variable over a field is 
never a PID. 


Both of these examples are actually examples of Euclidean rings, which are always PIDs. 
There are, however, more complicated examples of PIDs which are not Euclidean rings. 
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189.7 field of quotients 


field of quotients 


1: [integral doa 
2: A= {a/b]a,be R&b #0} 

3: Vaı/b1,a2/b2 € A : a1b2 = agbo = a/b, ~ ag/be 
4: A/~w 


random and presumably unrelated definition 


1: An 
2: in F[z] 
3: for some F 
4: whose formal [derivativel is nonzero 
5: If thelcharacteristiclof F is 0, then every nonzero irreducible polynomial is separable} 
6: If the characteristic of F is p # 0, then a nonzero irreducible polynomial is separable 

if and only if it can be written as a polynomial in g”. 
Note: This is a “seed” entry written using a short-hand format described in this FAQ 


Version: 3 Owner: bwebste Author(s): yark, apmxi 


189.8 integral domain 


An integral domain is a cancellation ring which has an identity element) 1 Æ 0. 


Integral domains are usually also assumed to be 
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189.9 irreducible 


Let D be an integral domain, and let r be a nonzero element of D. We say that r is 


irreducible in D if for any factorization r = ab in D we must have that a or b is an unit 
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189.10 motivation for Euclidean domains 


UFDs, and are ways of building successively more of standard 
number theory into a 


First, observe that the units] are numbers that are 1-like. Obvious examples, besides 1 itself, 
are —1 in Z or ¿į and —i in the 


Ideals) behave something like the set of products of an element; in Z those are the ideals 
(together with the [zero ideal), and ideals in other |rings) have some similar behavior. 


In commutative rings, prime ideals have one |property similar to the prime numbers, Specif- 


ically, a product of two elements is in the ideal exactly when one of those elements is already 
in the ideal, the way that if a - b is even we know that either a or b is even, but if we know 
it is a multiple of four, that could be because both are even but not divisible by four. 


The other property most associated with prime numbers is their irreducability: the only way 
to [factor] an lirreducible| element is to use a unit, and since units are ” 1-like,” that doesn’t 
really break the element into smaller pieces. (Specifically, the non-unit factor can always be 
broken into another unit times the original irreducible element). 


In a UFD these two properties of prime numbers coincide for non-zero numbers. All 
of prime ideals) are irreducible and all irreducibles are prime 
elements. In addition, all numbers can be factored into prime elements the same way inte- 
gers can be factored into primes 


A principal ideal domain behaves even more like the integers by adding the concept of a 
Formally this holds because for any two ideals, in any ring, we 
can find a minimal ideal which [contains both of them, and in a PID we have the guarantee 
that the new ideal is generated by a particular element—the greatest common divisor. The 
[Dedekind-Hasse valuation! on the ring encodes this property, by requiring that, if a is not a 
multiple of b (that is, in (b)) then there is a common divisor which is simpler than b (the 
formal definition is that the element be of the form ax + by, but there is, in general, a 


connection between linear combinations) of elements and their greatest common divisor). 


Being a Euclidean domains is an even stronger requirement, the most important effect of 


which is to provide for finding g.c.d’s. The key property is that di- 
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vision with remainders can be performed akin to the way it is done in the integers. A 
again encodes this property by ensuring that remainders are limited 
(specifically, requiring that the mormi of the remainder be less than the norm of the divi- 
sor). This forces] the remainders to get successively smaller, guaranteeing that the process 
eventually halts. 
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189.11 zero divisor 


Let R be a |ring, A nonzero element a € R is called a zero divisor if there exists a nonzero 
element b € R such that a-b = 0. 


Example: Let R = Zę. Then the elements 2 and 3 are zero divisors, since 2-3 6 0 
(mod 6). 


Version: 2 Owner: saforres Author(s): saforres 


838 


Chapter 190 


13H05 — Regular local rings 


190.1 regular local ring 


A R of dimension) n is regular if and only if its m is generated by n 


elements. 


Equivalently, R is regular] if dimp/mm/m? = dim R, where the first dimension is that of a 
and the latter is the since by Nakayama’s lemma, elements 
generate m if and only if their images] under the generate m/m?. 


By |Krull’s principal ideal theorem, m cannot be generated by fewer than n elements, so the 
maximal ideals of regular local rings have a minimal number of |generators| 
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Chapter 191 


13H99 — Miscellaneous 


191.1 local ring 


Commutative case 


A with multiplicative identity is called localif it has exactly one maximal ideal} 
This is the case if and only if 1 4 0 and the sum of any two non-units| in the ring is again a 


non-unit; the unique maximal ideal consists precisely of the non-units. 


The name comes from the fact that these rings are important in the study of the local 
behavior of varieties and the ring of germs at a point is always local. 
(The reason is simple: a germ f is invertible in the ring of germs at x if and only if f(x) £0, 
which implies that the sum of two non-invertible elements is again non-invertible.) This is 


also why schemes, the generalizations of varieties, are defined as certain locally ringed spaces 


Other examples of local rings include: 


e All|fields| are local. The unique maximal ideal is (0). 
e Rings of over a field are local, even in several variables. The unique 


maximal ideal consists of those power series without constant term. 


e if R is a commutative ring with multiplicative identity, and p is a in R, 
then the {localization| of R at p, written as Ry, is always local. The unique maximal 
ideal in this ring is pRy. 


e All are local. 
A local ring R with maximal ideal m is also written as (R, m). 


Every local ring (R,m) is a topological ringi in a natural way, taking the powers of m as a 
neighborhood) base of 0. 
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Given two local rings (R,m) and (S,n), a local ring homomorphism from R to S is a 


f : R — S (respecting the multiplicative identities) with f(m) C n. 
These are precisely the ring homomorphisms that are [continuous] with respect to the given 


‘topologies|on R and S. 
The residue field of the local ring (R, m) is the field R/m. 


General case 


One also considers non-commutative local rings. A \ring| with multiplicative identity is called 
local if it has a unique maximal [left ideal) In that case, the ring also has a unique maximal 
and the two ideals coincide with the ring’s Jacobson radical, which in this case 


consists precisely of the non-units in the ring. 


A ring R is local if and only if the following condition holds: we have 1 4 0, and whenever 
x € R is not invertible, then 1 — x is invertible. 


All [skew fields! are local rings. More interesting examples are given by endomorphism rings} 
a finite-length module over some ring is indecomposable if and only if its endomorphism ring 
is local, a consequence of Fitting’s lemma. 
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191.2 semi-local ring 


A semi-local ring is a commutative ring with finitely many maximal ideals 
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Chapter 192 


13J10 — Complete rings, completion 


192.1 completion 


Let (X,d) be a {metric space| Let X be the set of all [Cauchy sequences] {,}nen in X. 


Define an equivalence relation|~ on X by setting {2n} ~ {yp} ifthe interleave sequence| of 
the sequences] {£n} and {yn} is also a Cauchy sequence. The completion of X is defined to 


be the set X of of X modulo ~. 
The {metric d on X extends to a metric on X in the following manner: 


A({tn},{Yn}) = lim d(En, Yn), 


where {x} and {yn} are representative Cauchy sequences of elements in X. The definition 
of ~ is tailored so that the [limit]! in the above definition is well defined, and fact that these 
sequences are Cauchy, together with the fact that R is\complete| ensures that the limit exists. 
The space X with this metric is of course a complete metric space. 


Note the similarity between the construction of X and the construction of R from Q. The 
process used here is the same as that used to construct the {real numbers) R, except for the 
[minor] detail that one can not use the terminology of metric spaces in the construction of R 
itself because it is necessary to construct R in the first place before one can define metric 
spaces. 


192.1.1 Metric spaces with richer structure 


If the metric space X has an|algebraic|structure| then in many cases this algebraic structure 
carries through unchanged to X simply by applying it one element at a time to sequences in 
X. We will not attempt to [state] this principle precisely, but we will mention the following 
important instances: 
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1. If (X,-) is a topological group, then X is also a topological group with multiplication 
defined by 


2. If X is altopological ring| then addition and multiplication extend to X and make the 


completion into a topological ring. 


3. If F is alfield with alvaluation| v, then the completion of F with respect to the metric 
imposed by v is a denoted F, and called the completion of F at v. 


192.1.2 Universal property of completions 


The completion X of X satisfies] the following {universal property| for every continuous map) 
f: X — Y of X into a complete metric space Y, there exists a unique lifting of f to a 
continuous map f : X —> Y making the diagram 


SZ 


commute. Up to the completion of X is the unique metric space satisfying 
this property. 


X 
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Chapter 193 


13J25 — Ordered rings 


193.1 ordered ring 


An ordered ring is a R with an ordering relation < such that, for every 


a,b,c E R: 


1. Ifa <b, thena+c<b+c 


2. Ifa <b and 0 <c, then c-a <c-b 
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Chapter 194 


13J99 — Miscellaneous 


194.1 topological ring 


A {ring R which is a topological space is called a topological ring if the addition and multi- 
plication functions are [continuous functions from R x R to R. 


A field) which is a topological ring is called a topological field. 
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Chapter 195 


13N15 — Derivations 


195.1 derivation 


Let k be a field) A derivation d on a k-algebra V is a d: V — V 
satisfying the [properties] 


e d(x +y) = dz + dy 
o d(x- y) =x- dy+dz-y 
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Chapter 196 


13P10 — Polynomial ideals, Grobner 
bases 


196.1 Grobner basis 


Definition of monomial orderings and 
Let F be a (field) and let S be the set of|monomials)in F[z1,...,2,], the in 
n indeterminates. A monomial ordering is a total ordering < on S which satifies 


1. a < b implies that ac < bc for all a,b,c € S. 
2. l1<aforallacS. 
Henceforth, assume that we have |fixed|a monomial ordering. Take a € F [v1,.-.,%n]. Define 


the support of a, denoted supp(a), to be the set of monomials of a with nonzero coefficients. 
Then define M(a) = max(supp(a)). 


A partial order on F'[x,..., £n]: 


We can extend our monomial ordering to a partial ordering on F[z1,..., £n] as follows: Let 
a,b € F|z1,..., £n]. If supp(a) # supp(b), we say that a < b if max(supp(a) — supp(b)) < 
max(supp(b) — supp(a)). 


It can be shown that: 


1. The [elation] defined above is indeed a partial order on F[x1,..., £n] 


2. Every descending [chain] p:(21,...,2n) > po(21,...,2n) > ... with p; € [71,..., £n] is 


A division algorithm for F'[xj,..., £n]: 


We can then formulate a division algorithm for F[z1,..., £n]: 
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Let (fi,..., fs) be an ordered s-tuple of polynomials, with f; € F|z1,..., £n]. Then for each 
f € F|z1,..., £n], there exist a1,...,a,,7 € F|z1,..., £n] with r unique, such that 


ME E E ET 
2. For each i=1,...,8, M(a;) does not [dividelany monomial in supp(r). 


Furthermore, if a;f; 4 0 for some i, then M (aifi) < M(f). 


Definition of Grobner basis: 
Let I be a nonzero lideall of F[r1,...,2,]. A finite set T C I of polynomials is a Grébner 
basis for I if for all b € J with b # 0 there exists p € T such that M(p) | M(0). 


Existence of Grobner 
Every ideal I C k[x1,...,2n] other than the |zero ideall has a Grobner basis. Additionally, 
any Grobner basis for J is also a lbasisl of J. 
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Chapter 197 


14-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


197.1 Picard group 


The Picard group of a ees or more generally (X, Ox) is the 
group) of locally free) locally free Ox modules Footaled of ban 1 with tensor product ‘tensor product] over Ox as the operation. 


It is not difficult to see this is [isomorphic] to H'(X, O%), the first sheaf cohomology] group of 
the multiplicative sheaf O% which consists of the units! of Ox. 
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197.2 affine space 


Affine space of dimension! n over a [field] k is simply the set of ordered n-tuples k”. 
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197.3 affine variety 


An affine variety over an algebraically closed |field k is alsubset] of some affine space] k” over 
k which can be described as the vanishing set] of finitely many polynomials in n variables 
with coefficients in k, and which cannot be written as the [union] of two smaller such sets. 
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For example, the locus described by Y — X? = 0 as a subset of C? is an affine variety over 


the complex numbers, But the locus described by YX = 0 is not (as it is the union of the 
loci X = 0 and Y = 0). 


One can define a subset of affine space k” or an affine variety in k” to be closediif it is a subset 
defined by the vanishing set of finitely many polynomials in n variables with coefficients in k. 
The closed subsets then actually satisfy the requirements for|closed setslin a/topology, so this 


defines a topology on the affine variety known as the Zariski topology, The definition above 
has the extra condition that an affine variety not be the union of two closed subsets, i.e. it is 


required to an topological space in the Zariski topology. Anything then satisfying 
the definition without possibly the irreducibility is known as an (affine) 


A quasi-affine variety is then an [open set] (in the Zariski topology) of an affine variety. 


Note that some geometers do not require what they call a variety to be irreducible, that is 
they call algebraic sets varieties. The most prevalent definition however requires varieties to 
be irreducible. 


References: Hartshorne, “Algebraic Geometry.” 
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197.4 dual isogeny 


Let E and E’ be elliptic curves] over alfield| K of \characteristic| 4 2, 3,and let [m] denote the 
multiplcation-by-m lisogeny|on Æ. Then there exists a unique isogeny f : E' > E, called the 
dual isogeny to f, such that f o f = [m]. 


Often only the existence of a dual isogeny is needed, but the construction is explicit via the 


exact sequence 


E' = Div°(E’) Ë Div°(E) > E, 
where Div” is the |divisors] of degree 0 on an elliptic curve. 
Version: 2 Owner: mathcam Author(s): mathcam, nerdy2 
197.5 finite morphism 


A finite morphism of affine schemes] f : Spec A — Spec B is a morphism with the 
that the associated of rings| f* : B > A makes A into a B-algebra. 
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Likewise, a finite morphism of f : V — W is a morphism with the property 
that the associated homomorphism of rings f* : A(W) — A(V) makes A(V) into a finite 
A(W)-module, where A(V) denotes the \coordinatel ring of V. 


A morphism f : X — Y of|schemeslis finite if Y has a covering by finitely many open affine 
schemes U;, such that f~'(U;) = V; is an open affine subscheme of X for each i, and the 


‘induced|jmap) f |v, : Vi > U; is finite for each i. 


Likewise, a morphism f : X — Y of varieties is finite if Y has a covering by finitely many 
open affine varieties U; such that f~!(U;) = V; is an open affine subvariety of X for each å, 
and the induced map f|y, : V; — U; is finite for each 7. 


As an example, consider the map f : A! > A! given by x + 2? where A! is the affine line 


over some [algebraically closed||field) The associated map of rings is k[a] > k[x], x => 2”, 
which [clearly is finite, so the original morphism of affine varieties is finite. 
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197.6 isogeny 


Let E and E" beļelliptic curves|over alfield|k. An isogeny between E and E’ is a\finite morphism] 


f: E — E' that preserves basepoints. 


The two are called isogenous if there is an isogeny between them. This is an 


equivalence relation, being due to the existence of the dual isogeny| Every isogeny 
is an and thus induces homomorphisms of the groups) of the elliptic 


curves for k-valued points. 
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197.7 line bundle 


In the line bundle refers to a of rank] 
1, also called an In it refers to a or one 
dimensional|vector bundle) These notions are equivalent) on a non-singular complex algebraic 
variety X: given a one dimensional vector bundle, its|sheaf of holomorphic|sections| is locally 


free and of rank 1. Similarly, given a locally free sheaf F of rank one, the space 


L= |) o,/meF x 


rExX 
given the coarsest |topology| for which sections of F define [continuous functions) in a vector 
bundle of complex dimension! 1 over X, with the obvious mapi taking the lstalk] over a point 
to that point. 
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197.8 nonsingular variety 


A variety over an algebraically closed field] k is nonsingular at a point x if the [local ring) O, is 
a{regular local ring; Equivalently, if around the point, one has an [open] affine neighborhood) 
wherein the variety is out by certain F,,...F, of m variables 21,...,2%m, 
then it is nonsingular at x if the Jacobian) has at that point. Otherwise, x is 


a singular point. 
A variety is nonsingular if it is nonsingular at each point. 


Over the or;complex numbers, nonsingularity corresponds to “smoothness”: at nonsingu- 


lar points, varieties are locally real or|complex manifolds] (this is simply the implicit function theorem). 
Singular points generally have “corners” or selfjintersections| Typical examples are the|curves| 


x? = y? which has a(cuspjat (0,0) and is nonsingular everywhere else, and x?(a + 1) = y? 
which has a self-intersection at (0,0) and is nonsingular everywhere else. 
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197.9 projective space 


Projective space and homogeneous coordinates. Let K be alfield) Projective space of 
[dimensionin over K, typically denoted by KP”, is the set of lines passing through the origin 


in K"*!. More formally, consider the ~ on the set of non-zero points 
K”+!\ {0} defined by 


x~vAx, xe€K"*'\{o}, A] K\{o}. 
Projective space is defined to be the set of the corresponding [equivalence classes) 


Every x = (%,...,%) E€ K"*"\{0} determines an element of projective space, namely the 
line passing through x. Formally, this line is the equivalence class [x], or [£o : £1 : ... : £n], as 
it is commonly denoted. The numbers 2,..., £n are referred to as homogeneous coordinates 
of the line. Homogeneous coordinates differ from ordinary coordinatelsystems in that a given 
element of projective space is labelled by multiple [homogeneous] “coordinates” . 


Affine coordinates. Projective space also admits a more conventional type| of coordinate 
system, called affine coordinates. Let Ag C KP” be the subset) of all elements p = [£o : £1 : 
... ©: Zn] € KP” such that 29 # 0. We then define the [functions] 


X;: Ap > K”, 7=1,...,", 


according to 


Xi 
X; = 5 
(p) = 
where (£o, £1, ..-, Zn) is any element of the equivalence class representing p. This definition 


makes sense because other elements of the same equivalence class have the form 


(Yo, Yi; <- -, Yn) = (A@os AT1, sg ALn) 


for some non-zero À € K, and hence 


The functions X1,..., Xn are called affine coordinates relative to the 


Ho = {to = 1} C RK”. 


Geometrically, affine coordinates can be described by saying that the elements of Ao are 
lines in K"*+ that are not parallel to Ho, and that every such line intersects Hp in one and 
exactly one point. Conversely points of Hp are represented by tuples (1, £1,..., 8n) with 
(%1,.-.,;%p) E€ K”, and each such point uniquely labels a line [1 : zı :...:@,] in Ao. 


It must be noted that a single system of affine coordinates does not (cover) all of projective 
space. However, it is possible to define a system of affine coordinates relative to every 
hyperplane in K+! that does not [contain] the origin. In particular, we get n + 1 different 
systems of affine coordinates corresponding to the hyperplanes {x; = 1}, i = 0,1,...,n. 
Every element of projective space is covered by at least one of these n + 1 systems of 
coordinates. 


Projective automorphisms. The linvertible linear transformations| of K”*! determine a 
corresponding [group] of of projective space. Let A: K"t! — K"*! be a non- 
singular {linear transformation, The corresponding projective automorphism [A] : KP” > 
KP” is defined to be the {transformation! with action 


[x] > [Ax], xe kK", 


It is evident that for every non-zero A € K the transformation AA gives the same projective 
automorphism as A. For this reason, it is convenient to identify the group of projective 
automorphisms with 

PSL, (K) = SLnyi (K)/On41- 


Here SL,,,, denotes the “special” group of unimodular linear transformations, that is trans- 
formations of K"*' having determinant! 1. The symbol 2,,,; denotes the [subgroup] generated 
by elements wT, where w is a (n+ 1)* The unimodular conditition is almost 
sufficient to uniquely specify a linear transformation to |represent]a given projective action. 
However, note that 


det(AA) = "+! det(A), Ae SLau, AEK, 
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and hence if the field K admits non-trivial roots of unity w, then multiplication by such an 
w preserves the determinant. Hence, the projective action of A € SL, coincides with the 


projective action of wA, making it necessary to quotient SL,,,, by the normal subgroup) Qn+1 
in [order] to obtain the group of projective automorphisms. 
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197.10 projective variety 


Given a F of degree d in n+1 variables Xo,..., Xn and a point [zo : 
+++: £n], we cannot evaluate F at that point, because it has multiple such 


but since F(A£o,..., A£n) = A F (z0,..., £n) we can say whether any such representation 
(and hence all) vanish at that point. 


A projective variety over an/algebraically closed|field|k is alsubset|of some|projective space|P? 
over k which can be described as the common vanishing locus of finitely many homogeneous 
polynomials with coefficients in k, and which is not the funion| of two such smaller loci. 


Version: 4 Owner: nerdy2 Author(s): nerdy2 


197.11 quasi-finite morphism 


A morphism f : X — Y of schemes] or varieties is quasi-finite if for each y € Y, the 
f(y) isa set. 
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Chapter 198 


14A10 — Varieties and morphisms 


198.1 Zariski topology 


Let Az denote the |affine space] k” over a k. The Zariski topology on Aj is defined to be 
the {topology whose [closed sets are the sets 


V(I) := {x € A; | f(x) = 0 for all f € I} C Aj, 


where I C k[X1,..., Xn] is anylideallin the{polynomial ring|k[X1,..., Xn]. For anyļaffine variety] 
V C At, the Zariski topology on V is defined to be the subspace topology induced] on V as 
a [subset] of Ar. 


Let P} denote n-dimensional projective space over k. The Zariski topology on P% is defined 


to be the topology whose closed sets are the sets 
V(I) := {x € P} | f(x) = 0 for all f € I} c P}, 


where I C k|Xo,..., Xn] is any homogeneous ideal in the graded k-algebra k|Xo,..., Xn]. 
For any projective variety| V C Pr, the Zariski topology on V is defined to be the subspace 


topology induced on V as a subset of P¢. 


The Zariski topology is the predominant topology used in the study of 
Every of varieties is in the Zariski topology (but not every 


[continuous map| in the Zariski topology is a regular morphism). In fact, the Zariski topology 
is the weakest topology on varieties making points in A} and regular morphisms 


continuous. 
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198.2 algebraic map 


A map] f : X — Y between quasi-affine varieties X C k",Y C k™ over a k is called 


algebraic if there is a map f’ : k” — k™ whose [component] functions are |polynomials, such 
that f’ restricts to f on X. 


Alternatively, f is algebraic if the pullback map f* : C(Y) — C(X) takes the [coordinate] 
ring) of Y, k[Y], to the coordinate ring of X, k[X]. 
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198.3 algebraic sets and polynomial ideals 


Suppose k is an algebraically closed] field| Let Ag denote affine n-space over k. 
For S C k[z1,...,%n], define V(S), the zero set of S, by 


V(S) = {(a1,.--,@n) E€ K” | f(ai,...,@,) = Ofor allf € S} 


We say that Y C AZ is an algebraic set if there exists T C k|z1,. .., £n] such that Y = V (T). 
The [subsets] of Aj which are algebraic sets induces the Zariski topology over Ag. 


For Y C A}, define the ideal of Y in k[x1,..., £n] by 


I(Y) = {f € k|z1,..., £n] | f(P) = Ofor allP € Y} 


It is easily shown that I(Y) is an ideal of k[x1,..., £n]. 


Thus we have defined a [function] Z from subsets of k|z1,..., £n] to algebraic sets 


in Af, and a function J mapping from subsets of A” to ideals of k[x1,..., £n]. 


These maps have the following 


1. Sı C S2 C k|z1, ..., £n] implies V(S1) 2 V(S2). 

2. Yı C Yo C AZ implies I(Y1) 2 I(Y2). 

3. For any ideal a C k|z1,..., £n], I(V (a)) = Rad(a). 

4. For any Y C A}, V(I(Y)) =Y, the[closurelof Y in the Zariski topology. 
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From the above, we see that there is a 1-1 correspondence between algebraic sets in A? and 
radical ideals) of k[z1,...,2,]. Furthermore, an algebraic set Y C A? is an affine variety) if 
and only if [(Y) is a [prime ideal] 
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198.4 noetherian topological space 


A ‘topological space] X is called if it the for 
for any sequence| 


Fp 2 Fg 2- 
of closed subsets Y; of X, there is an [integer] m such that Ym = Ymy =>>. 


Example: 


The space A? (affine n-space over a [field] k) under the [Zariski topologyļis an example of a 
noetherian topological space. By [properties] of the [ideal] of a subset of A}, we know that if 
Yi D Yə D --- is a descending [chain] of Zariski-closed subsets, then I(Y,) C I(Y2) C +-+- is 
an ascending chain of ideals of k[x1,..., £n]. 


Since k[a1,...,@p| is a Noetherian ring} there exists an integer m such that (Ym) = I (Ym+1) = 
-»+, But because we have a correspondence between [radical ideals) of k|x1,..-, £n] 


and Zariski-closed sets in Ağ, we have V(I(Y;)) = Y; for all i. Hence Ym = Ym+4ı =+- as 
required. 


Version: 3 Owner: saforres Author(s): saforres 


198.5 regular map 


A regular map ¢ : k” — k™ between \affine spaces|over an algebraically closed|fieldlis merely 
one given by polynomials) That is, there are m polynomials F|,..., Fm in n variables such 
that the mapjis given by @(%1,...,2n) = (Fi(2),...,Fm(x)) where x stands for the many 


components Ti. 


A regular map ¢: V — W between laffine varietieslis one which is the restriction of a regular 
map between affine spaces. That is, if V C k” and W C k™, then there is a regular map 
p : k” + k™ with y(V) c W and @ = wy. So, this is a map given by polynomials, whose 
image) lies in the intended target. 


A regular map between algebraic) varieties is a locally regular map. That is ọ : V — W is 
regular|if around each point x there is an affine variety V, and around each point f(x) € W 
there is an affine variety Wy.) with ¢(V,) C Wei2) and such that the restriction V, — W(x) 
is a regular map of affine varieties. 
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198.6 structure sheaf 


Let X be an variety over a [feld] k, together with the |Zariski topology 
Fixja point x € X and let U C X be any affine opem|subset) of X containing x. Define 


oz :={f/g9 € k(U) | f,g € kU], g(x) #0}, 


where k[U] is the |coordinate|ring) of U and k(U) is the [fraction field) of k[U]. The ring 0, is 
independent) of the choice of affine open neighborhood] U of x. 


The structure sheaf on the variety X is the sheaflof rings whose |sections|on any open subset 
U C X are given by 


Ox(U) := [|ts 


xEeU 


and where the restriction map] for V C U is the inclusion map|0x(U) > Ox(V). 


There is an [equivalences of category| under which an|affine variety X with its structure sheaf 
corresponds to the of the coordinate ring k[X]. In fact, the topological 
X — Spec(k[X]) gives rise to a lattice—preserving between the open 
sets of X and of Spec(k[X]), and the sections of the structure sheaf on X are to 
the sections of the sheaf Spec(k[X]). 


Version: 1 Owner: djao Author(s): djao 


1 Those who are fans of topos will recognize this map as an [isomorphism] of topos. 
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Chapter 199 


14A15 — Schemes and morphisms 


199.1 closed immersion 


A f : (X, Ox) — (Y, Oy) is a closed immersion if: 


1. As ammapjof topological spaces} f : X — Y is alhomeomorphism| from X into a\closed] 
subset] of Y. 

2. The Oy — Ox associated with f is an in the 
of sheaves. 
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199.2 coherent sheaf 


Let R be aring) and X = Spec R be the its [prime spectrum, Given an R-module M, one 
can define a|presheaf|on X by defining its|sections|on an/open set|U to be Ox(U) 8r M. We 
call the sheafification| of this M, and alsheaflof this form on X is called quasi-coherent. If M 
is a finitely generated [module] then M is called coherent. A sheaf on an arbitrary [scheme] 
X is called (quasi-)coherent if it is (quasi-)coherent on each open affine [subset] of X. 
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199.3 fibre product 


Let S be alscheme) and let i: X — S and j : Y — S be schemes over S. A fibre product 
of X and Y over S is a scheme X xg Y together with morphisms 


p:X xsY — X 
q: X xsY —Y 
such that given any scheme Z with morphisms 


t: Z — X 
y: Z — Y 


where to x = 7 o y, there exists a unique morphism 


(x,y): Z — X xsY 


making the diagram 


commute. In other words, a fiber product is an object X xs Y, together with morphisms 
p,q making the diagram commute, with the property| that any other 
(Z, x,y) forming such a [commutative diagram|maps| into (X xs Y, p, q). 


Fibre products of schemes always exist and are unique up to canonical isomorphism 


Other notes Fibre products are also called pullbacks and can be defined in any 
using the same definition (but need not exist in general). For example, they always exist in 


the category of over a'fixed|ring| as well as in the category of groups} 
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199.4 prime spectrum 


199.4.1 Spec as a set 


Let R be any with The prime spectrum Spec(R) of R is defined 


to be the set 
{P Ç Rsuch thatP is a prime ideal of R}. 
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For any |subset|.A of R, we define the variety of A to be the set 
V(A) := {P € Spec(R)such thatA C P} C Spec(R) 
It is enough to restrict attention to subsets of R which arelideals|, since, for any subset A of 


R, we have V(A) = V(I) where J is the ideal generated by A. In fact, even more is true: 
V(I)=V(vVT) where VI denotes thelradicallof the ideal J. 


199.4.2 Spec as a topological space 


We impose a [topology] on Spec(R) by defining the sets V(A) to be the [collection] of (closed) 
subsets of Spec( R) (that is, a subset of Spec(R) is if and only if it equals the/complement| 


of V(A) for some subset A). The equations 
()V Ue) V (U | 


=) 
i=1 i=1 


Uv) 


for any ideals [,, J; of R, establish that this collection does constitute a topology on Spec(R). 


This topology is called the Zariski topology in light of its relationship to the Zariski topology 
on an |algebraic| variety (see |section[199.4.4] below). Note that a point P € Spec(R) is closed 
if and only if P C R is a maximal ideal 


A distinguished open set of Spec(R) is defined to be an open set of the form 
Spec(R) ¢ := {P E Spec(R)such that f € P} = Spec(R) \V({f}), 


for any element f € R. The collection of distinguished open sets forms a topological [basis] 
for the open sets of Spec(R). In fact, we have 


Spec(R) \ V(A) = |] Spec(R)ș. 
fEA 


The topological space Spec(R) has the following additional properties) 


e Spec(R) is/compact| (but almost never Hausdorff). 


e A subset of Spec(R) is an if and only if it equals V(P) for some 
prime ideal] P of R. 


e For f € R, let Ry denote the localization] of R at f. Then the topological spaces 


Spec(/) and Spec(Ry) are naturally |homeomorphic, via the correspondence sending 
a prime ideal of R not containing f to the induced) prime ideal in Ry. 
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e For P € Spec(R), let Rp denote the localization of R at the prime ideal P. Then the 
topological spaces V(P) C Spec(R) and Spec(Rp) are naturally homeomorphic, via 
the correspondence sending a prime ideal of R contained in P to the induced prime 
ideal in Rp. 


199.4.3 Spec as a sheaf 


For convenience, we adopt the usual convention of writing X for Spec(R). For any f € R 


and P € Xp, let vyp : Ry — Rp be the Define a [presheaf of rings 
Ox on X by setting 


U has an open cover {X;,} with elements sa € Ry, 
such that sp = t¢, p(Sq) whenever P E€ Xi, i 


Ox(U) := fen e | [Re 


PEU 
for each open set U C X. The restriction(mapjresy,y : Ox (U) — Ox(V) is the map induced 


by the projection map 
[] & — IT Fr. 


PEU PEV 
for each open subset V C U. The presheaf Ox [satisfies] the following properties: 


1. Ox isa 

2. Ox(X+) = Ry for every f € R. 

3. Thelstalkl(Ox)p is equal to Rp for every P € X. (In particular, X is allocally ringed space]) 
4. The restriction sheaf of Ox to X is isomorphic] as a sheaf to Ospec(Ry)- 


199.4.4 Relationship to algebraic varieties 


Spec(R) is sometimes called an because of the close relationship between 
in A? and the Spec of their corresponding rings. In fact, the 


correspondence between the two is an equivalences of category, although a complete] state- 
ment of this equivalence] requires the notion of and will not be given 


here. Nevertheless, we explain what we can of this correspondence below. 


Let k be a|field| and write as usual A? for the {vector space) k". Recall that an affine variety 
V in A? is the set of common zeros of some prime ideal I C k[X1,..., Xn]. The coordinate 


ring of V is defined to be the ring R := k[X1,..., Xn] /I, and there is an [embedding] i :V => 
Spec( R) given by 


ilai,- , an) := (X1 —a1,..., Xn — an) € Spec( R). 
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The function i is not alhomeomorphism, because it is not a|bijection| (its |image]is contained 
inside the set of maximal ideals of R). However, the map i does define an order] preserving 


bijection between the open sets of V and the open sets of Spec(R) in the Zariski topology. 
This isomorphism] between these two lattices of open sets can be used to equate the sheaf 
Spec(R) with the|structure sheaf of the variety V, showing that the two objects are identical 
in every respect except for the {minor| detail of Spec(R) having more points than V. 


The additional points of Spec(R) are valuable in many situations and a systematic study of 
them leads to the general notion of schemes) As just one example, the classical Bezout’s theo- 


rem is only valid for algebraically closed fields, but admits a scheme—theoretic generalization 


which holds over non—algebraically closed fields as well. We will not attempt to explain the 
[theory] of schemes in detail, instead referring the interested reader to the references below. 


REFERENCES 


1. Robin Hartshorne, Algebraic Geometry, Springer-Verlag New York, Inc., 1977 (GTM 52). 
2. David Mumford, The Red Book of Varieties and Schemes, Second Expanded Edition, Springer- 
Verlag, 1999 (LNM 1358). 
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199.5 scheme 


199.5.1 Definitions 


An affine scheme is a locally ringed space (X, Ox) with the |property that there exists a 
ring) R (commutative, with identity) whose Spec(R) is isomorphic to X as 


a locally ringed space. 


A scheme is a locally ringed space (X,0x) which has an {Ua}acr with the 
property that each|open set|Ua, together with its restriction sheaf Ox|y,,, is an affine scheme. 


We define a morphism of schemes between two schemes (X, Ox) and (Y, Oy) to be a mor- 
phism of locally ringed spaces f : (X, Ox) — (Y, Oy). A scheme over Y is defined to be a 
scheme X together with a morphism of schemes X — Y. 


Note: Some authors, notably Mumford and Grothendieck, require that a scheme be\separated| 
as well (and use the|term] prescheme to describe a scheme that is not separated), but we will 
not impose this requirement. 
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199.5.2 Examples 


e Every affine scheme is |clearly a scheme as well. In particular, Spec(R) is a scheme for 
any R. 


e Every variety can be interpreted as a scheme. An corresponds to the 
prime spectrum of its coordinate] ring, and a [projective variety| has an open cover by 


affine pieces each of which is an affine variety, and hence an affine scheme. 
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199.6 separated scheme 


A bchemel X is defined to be a separated scheme if the morphism 


d: X — X Xgpecz X 


into the fibre product) X Xgpecz X which is linduced) by the identity maps) i : X — X in 
each |coordinate is a [closed immersion! 


Note the similarity to the definition of a topological space, In the situation of 
topological spaces, a space X is Hausdorff if and only if the diagonal morphism X —> X xX 


is a of topological spaces. The definition of a separated scheme is very 


similar, except that the topological product is replaced with the scheme fibre product. 
Version: 2 Owner: djao Author(s): djao 

199.7 singular set 

The singular set of a variety X is the set of singular points, This is a proper subvariety. A 


subvariety Y of X is contained in the singular set if and only if its local ring} Oy is regular] 
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Chapter 200 


14A99 — Miscellaneous 


200.1 Cartier divisor 


On a [scheme] X, a Cartier divisor is a/global section] of the lsheafi K*/O*, where K* is the 
sheaf of meromorphic functions, and O* the multiplicative sheaf of invertible 
(the hinits|of the structure sheaf). 


More explicitly, a Cartier divisor is a choice ofjopen cover|U; of X, and meromorphic functions 
fi € K*(U;), such that f;/f; € O*(U;()U;), along with two Cartier divisors being the same 
if the open cover of one is a [refinement] of the other, with the same functions attached to 


or if f; is replaced by gf; with g € O,. 


Intuitively, the only information carried by Cartier divisor is where it vanishes, and thelorder! 
it does there. Thus, a Cartier divisor should give us a!Weil divisor, and vice versa. On ” nice” 


(for example, nonsingular) over an algebraically closed|[field) schemes, it does. 
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200.2 General position 


In the |projective plane, 4 points are said to be in general position [iff no three of them are 


on the same line. Dually 4 lines are in general position iff no three of them in the same 
point. This definition naturally scales to more than four points/lines. 
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200.3 Serre’s twisting theorem 


Let X be alscheme) and £ an {ample invertible sheaf on X. Then for any [coherent sheaf F, 
and sufficiently large n, H'(F & £”) = 0, that is, the higher [sheaf cohomology] of F @ £” is 


trivial. 
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200.4 ample 


An [invertible sheaf £ on a [scheme] X is called ample if for any coherent sheaf F, F @ £” is 
globally generated for sufficiently large n. A \sheaf is ample if and only if £” is [very ample) 


for some m. 
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200.5 height of a prime ideal 

Let R be a\commutative ring) The height of a prime ideal] p is the supremum) of all integers] 
n such that there exists a\chain| pp C --- C pn = p of distinct prime ideals. 

The Krull dimension) of R is the supremum of the heights of all the prime ideals of R. 
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200.6 invertible sheaf 


A [sheaf] £ of Ox ‘modules! on a ringed space Ox is called invertible if there is another sheaf 
of Ox-modules £ such that £@ L' = Ox. A sheaf is invertible if and only if it is locally free 
of |rank] 1, and its linverselis the sheaf £Y ¥ Hom(£, Ox), by the obvious map} 


The set of invertible sheaves obviously form an abelian group under [tensor] multiplication, 
called the of X. 
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200.7 locally free 


Alsheaf F on a ringed space X is called locally free if for each point x € X, there is an {Open} 


neighborhood] U of x such that Fly is[free] or equivalently, F,, the |stalk| of F at x is/free as 
a O,-module. If F, is of {finite rank) n, then F is said to be of rank n. 
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200.8 normal irreducible varieties are nonsingular in 
codimension 1 


Theorem 18. Let X be ainormal irreducible variety. The|singular set S C X has\codimension 


2 or more. 


A ssume not. We may assume X is affine, since codimension is local. Now let u be thelideal] 
of functions| vanishing on S. This is an ideal of|height|1, SO the [local ring) of Y, Os = A(X )u 


where A(X) is the affine ring of X, is a 1-dimensional local ring, and integrally closed, since 
X is normal. Any integrally closed 1-dimensional local domain is a and thus 


But S is the singular set, so its local ring is not regular, a contradiction. 
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200.9 sheaf of meromorphic functions 


Given a ringed space X, let K, be the of the associating to each 
[open set|U the [fraction field) of Ox(U), where Ox is the structure sheaf) This is called the 


sheaf of meromorphic functions since on a complex algebraic variety, it is isomorphic) to the 
sheaf of functions which are meromorphic in the analytic sense. 
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200.10 very ample 


An [invertible sheaf £ on a|scheme| X over a [field] k is called very ample if (1) at each point 


a € X, there is a [global section|s € £(X) not vanishing at x, and (2) for each pair of points 
x,y € X, there is a global section s € £(X) such that s vanishes at exactly one of x and y. 


Equivalently, £ is very ample if there is an [embedding] f : X — P” such that f*O(1) = £. 
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If k is [algebraically closed] Riemann-Roch shows that on a{curve) X, any invertible sheaf of 
degree greater than or equal to 2g is very ample. 
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Chapter 201 


14C20 — Divisors, linear systems, 
invertible sheaves 


201.1 divisor 


A divisor D on a projective over an [algebraically closed|field| is a formal 


sum of points D = $ npp where only finitely many of the n, € Z are nonzero. 
The degree of a divisor D is deg(D) = X` np. 
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Chapter 202 


Rational and birational maps 


202.1 general type 


A variety is said to be of general type if its Kodaira [dimension] equals its dimension. 
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Chapter 203 


14F05 — Vector bundles, sheaves, 
related constructions 


203.1 direct image (functor) 


If f : X — Y is a\continuous map| of topological spaces) and if Sheaves(X) is the 


of sheaves of on X (and similarly for Sheaves(Y)), then the 
fs : Sheaves(X) — Sheaves(Y) sends a F on X to its direct image f,F 


on Y. A g:F —G gives rise to a morphism of sheaves 
f.9: fF — f9, and this determines a functor. 


If F is a sheaf of abelian groups (or anything else), so is f,F, so likewise we get direct image 
functors f, : Ab(X) — Ab(Y), where Ab(X) is the category of sheaves of abelian groups 
on X. 
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Chapter 204 


14F20 — Etale and other Grothendieck 
topologies and cohomologies 


204.1 site 


A siteisa with a Grothendieck topology 
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Chapter 205 


14F25 — Classical real and complex 
cohomology 


205.1 Serre duality 


Serre duality is a theorem which can be thought of as a massive generalization of Poincare duality 
to an context. 


The most general version of Serre duality|states|that on certain[schemes] X of dimension|n, in- 
cluding all projective varieties over any algebraically closed field|k, there is a 


Ext (F, w) S H(X, FY 
, F is any [coherent sheaf on X and w is a fixed |sheaf, called the dualizing sheaf. 


In special cases, this reduces to more approachable forms. If X is (or more 
generally, Cohen-Macauley), then w is simply A” Q, where Q is the sheaf of differentials on 
X. 


If F is [locally free| then 
Ext’ (F, w) S Ext’(Ox, F* 8 w) S H'(X, F @w), 


so that we obtain the somewhat more familiar looking fact that there is a |perfect)/pairing| 
H(X, F* @w) x H(X, F) > k. 
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205.2 sheaf cohomology 


Let X be a topological space, and assume that the of sheaves of abelian groups on 
X has|enough injectives, Then we define the sheaf cohomology H'(X,) of a|sheaf| F to be 


the right [derived functors) of the (global section|functor] F + T(X, 3). 
Usually we are interested in the case where X is a[scheme] and F is al\coherent sheaf\ In this 


case, it does not matter if we take the derived functors in the category of sheaves of abelian 
groups or coherent sheaves. 


Sheaf cohomology can be explicitly calculated using Cech |cohomology| Choose anjopen cover] 


{U;} of X. We define 
GG) =| Fea) 


where the product is over i + 1 element [subsets] of {1,...,n} and Ujoj = Ujo N- NUn. If 
s € F(Uj,...;,) is thought of as an element of C’ (F), then the differential 


o(s) = I] ( II hag sane 


£ k=je+1 


makes C*(F) into a [chain complex] The cohomology of this complex is denoted H‘(X, F) 
and called the Cech cohomology of F with respect to the cover] {U;}. There is a{natural/map) 


H'(X,F) — H'(X,¥F) which is an [isomorphism] for sufficiently fine covers. 
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Chapter 206 


14G05 — Rational points 


206.1 Hasse principle 


Let V be an variety defined over a K. By V(K) we denote the set of points 


on V defined over K. Let K be an [algebraic closure] of K. For alvaluation| v of K, we write 
K, for the [completion] of K at v. In this case, we can also consider V defined over K, and 
talk about V(K,). 


Definition 13. 


1. If V(K) is not empty we say that V is soluble in K. 

2. If V(K,) is not empty then we say that V is locally soluble at v. 

3. If V is locally soluble for all v then we say that V |satisfies| the Hasse condition, or 
we say that V/K is everywhere locally soluble. 


The Hasse Principle is the idea (or desire) that an everywhere locally soluble variety V 
must have alrationall point, i.e. a point defined over K. Unfortunately this is not true, there 
are examples of varieties that satisfy the Hasse condition but have no rational points. 


Example: A quadric (of any dimension) satisfies the Hasse condition. This was proved by 
Minkowski for quadrics over Q and by Hasse for quadrics over a number field) 


REFERENCES 
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Chapter 207 


14H37 — Automorphisms 


207.1 Frobenius morphism 


Let K be afield) of characteristic] p > 0 and let q = p". Let C be a [curva defined over K 


contained in P“, the [projective space] of [dimension] N. Define the [homogeneous ideal] of C 
to be (the [ideal] generated by): 


I(C) ={f € K[X,...,Xn]|VP EC, f(P)=0, f is homogeneous} 


For f € K[Xo,..., Xy], of the form f = >, a:X}0...XX" we define f = Y; af XY... XIN. We 
define a new curve C as the zero set] of the ideal (generated by): 


(Ce) = {fF | fe I(C)} 
Definition 14. The g'’-power Frobenius morphism is defined to be: 
$: C +c 


Ol Pinning |) = lzi, ae | 


In order! to check that the Frobenius morphism is well defined we need to prove that 
P = [a9, £N] E€ C > (P) = [z4,...24,] E C9 


This is to proving that for any g € [(C\) we have g(¢(P)) = 0. Without loss of 
generality we can assume that g is a [generator] of [(C™), ie. g is of the form g = f for 
some f € I(C). Then: 


g(O(P)) = SOP) = FO (26,0) 
((zo,---,2n]))%, [a1 + 6’ = (a + b)'in characteristic p] 


0, [Pec fel(C) 


as desired. 


Example: Suppose F is an elliptic curve) defined over K = Fy, the field of p” elements. In 
this case the [Frobenius map) is an [automorphism] of K, therefore 


E= E® 


Hence the Frobenius morphism is an [endomorphism] (or lisogeny) of the elliptic curve. 


REFERENCES 
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Chapter 208 


14H45 — Special curves and curves of 
low genus 


208.1 Fermat’s spiral 


Fermat’s spiral (or parabolic spiral) is an |archimedean spiral with the equation : 
ee GY. 


This [curve] was discovered by Fermat in 1636. 
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208.2 archimedean spiral 


An archimedean spiral is a spiral] with the following polar equation : 

r= abt, 
where a is alreall r is the 6 is the angle, and t is a{constant} 
For an archimedean spiral the curvature) « is given by the following formula) : 


t011 (t20? +t +1) 
K = |—— 
a6? F 1372 
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208.3 folium of Descartes 


The folium of Descartes is a (curve) with the cartesian equation: 


r? +y’ = 3azry, 
and the parametrically equation : 
dat 
Tae 
sat? 
= IFP 


The folium of Descartes has as asymptote 
d:y+r+a= 0, 


and the that 


where A; and Ag are the two areas form the figure. 
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208.4 spiral 


Let c(s) be a curva and let 7(s) and K(s) be the ltorsionl and the|curvaturel of c(s) . Then a 
spiral is a curve with the rapport zle) for all s. 
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Chapter 209 


14H50 — Plane and space curves 


209.1 torsion (space curve) 


Let g : I — R? be a parameterized space curve, assumed to be\regular and|freelof points of inflection 


Physically, we conceive of g(t) as a particle moving through space. Let T(t), N(t), B(t) de- 


note the corresponding The speed of this particle is given by 
s(t) = |ig'®ll. 


In [order] for a moving particle to escape the it is necessary for the particle 
to “roll” along the axis of its tangent vector, thereby lifting! the normal] acceleration [vector] 
out of the osculating plane. The “rate of roll”, that is to say the rate at which the osculating 
plane rotates about the tangent vector, is given by B(t) - N’(t); it is a number that depends 
on the speed of the particle. The rate of roll relative to the particle’s speed is the quantity 
(t) Bit) N’(t) _ (g'¢) x g"(t)) 8") 
T = ee Z A a a S 
s(t) lg’) x g”? 

called the torsion of thecurve a quantity that is invariant|with respect to 

The torsion 7(¢) is, therefore, ajmeasure) of an intrinsic |property of the oriented space curve, 
another |real number! that can be covariantly assigned to the point g(t). 
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Chapter 210 


14H52 — Elliptic curves 


210.1 Birch and Swinnerton-Dyer conjecture 


Let E be an over Q, and let L(E,s) be the L-series attached to Æ. 


Conjecture 1 (Birch and Swinnerton-Dyer). 


1. L(E,s) has a zero at s = 1 oflorder equal to the\rank of E(Q). 


2. Let R=rank(E(Q)). Then thelresidud of L(E, s) ats = 1, i.e. lim,_.,(s—1)-®L(E, s) 
has a concrete expression involving the following \invariants\ of E: the the 
Shafarevich-Tate\group, the elliptic [regulator] and the Neron [model of E. 


J. Tate said about this conjecture: “This remarkable conjecture relates the behavior 
of a function L at a point where it is not at present known to be defined to the 
order of a group (Sha) which is not known to be finite!” 


The following is an easy consequence of the B-SD conjecture: 


Conjecture 2. The root number of E, denoted by w, indicates the of the rank of the 


elliptic curve, this is, w = 1 if and only if the rank is even. 


There has been a great amount of research towards the B-SD conjecture. For example, there 
are some particular cases which are already known: 


Theorem 19 (Coates, Wiles). Suppose E is an elliptic curve defined over an\imaginary 
quadratic [field K, with complex multiplication by K, and L(E, s) is the L-series of E. If 
L(E,1) #0 then E(K) is finite, 
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210.2 MHasse’s bound for elliptic curves over finite fields 


Let E be an defined over a finite field! F, with q = p” elements (p € Z is a 
prime). The following theorem gives a[boundlof thelsizelof E( F,), Ng, i.e. the number points 
of E defined over F,. This was first conjectured by Emil Artin (in his thesis!) and proved 
by Helmut Hasse in the 1930’s. 


Theorem 20 (Hasse). 
| Ng-a-1|< 2/¢ 


Remark: Let a, = p+ 1 — N, as in the definition of the L-series of an ellitpic Then 
Hasse’s bound reads: 


| ap |< 2/p 
This fact is key for the convergence of the L-series of F. 
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210.3 L-series of an elliptic curve 


Let E be an elliptic curve over Q with Weierstrass equation: 


2 3 
yo + aty + agy = T” + aox + a,x + Ag 


with coefficients a; € Z. For p a prime in Z, define N, as the number of points in the 
reduction of the (curve) modulo p, this is: 


3 


N, = {(z,y) € F”: yY? + azy + asy — 2° — agr? — asx — ag & 0 mod p 
p P 
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Also, let a, = p + 1 — N,. We define the local part at p of the L-series to be: 


1—a,T + pT”, if E has good reduction at p, 


L,(T) = 1— T, if E has split multiplicative reduction at p, 
j  )147, if E has non-split multiplicative reduction at p, 


1, if Æ has additive reduction at p. 


Definition 15. The L-series of the elliptic curve E is defined to be: 
L(E,s) = I] ee 
p Ly(p~*) 


where the product is over all primes in Z. 
Note: The product converges and gives an for all Re(s) > 3/2. This follows 
from the fact that | ap |< 2,/p. However, far more is true: 


Theorem 21 Ta lor Wiles). The L-series L(E,s) has an analytic continuation to the 


[entird|complex plane, and it\satisfies the following [functional equation, Define 
A(E, 8) = (Nga)? (21) T (s)L(E, 8) 

where Ng/Q is the\conductor of E andT is the\gamma function, Then: 
A(E,s) = wA(E,2— s) with w= 1 


The number w above is usually called the root number of E, and it has an important 


conjectural meaning (see Birch and Swinnerton-Dyer conjecture). 
This result was known for elliptic curves having complex multiplication (Deuring, Weil) until 


the general result was finally proven. 
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210.4 Mazur’s theorem on torsion of elliptic curves 


Theorem 22 (Mazur). Let E/Q be an Then the Etorsion(Q) 
is exactly one of the following \groups; 


Z/NZ 1<N<10 or N=12 
Z)2L@L/2NLZ 1<NK<4 


Note: see Nagell-Lutz theorem for an efficient algorithm to compute the torsion subgroup of 


an elliptic curve defined over Q. 
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210.5 Mordell curve 


A Mordell curve is an [elliptic curve| E/K, for some field! K, which admits a {model by a 


Weierstrass equation of the form: 
yY =r +k, keK 
Examples: 
1. Let E\/Q: y? = z? + 2, this is a Mordell curve with Mordell-Weil group) £: (Q) ~ Z 


and generated by (—1, 1). 


2. Let E2/Q: y? = x? + 109858299531561, then E(Q) > Z/3Z QZ’. See [generators] 
here. 


3. In general, a Mordell curve of the form y? = z? + n? has to 
Z/3Z generated by (0,7). 


4. Let E3/Q: y? = x? + 496837487681 then this is a Mordell curve with E3(Q) ~ Z8. See 
generators here. 
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5. you can find a list of the minimal-known positive and negative k for Mordell 
curves of given rank, and the Mordell curves with maximum rank known (see B-SD 
conjecture). 
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210.6 Nagell-Lutz theorem 


The following theorem, proved independently by E. Lutz and T. Nagell, gives a very efficient 
method to compute the of an elliptic curve) defined over Q. 


Theorem 23 (Nagell-Lutz Theorem). Let E/Q be an elliptic curve with Weierstrass 
equation: 
y=a?+Ar+B, A,BEZ 


Then for all non-zero torsion points P we have: 
1. The{coordinates of P are in Z, i.e. 
x(P),y(P) EZ 
2. If P is of order greater than 2, then 
y(P)? divides 4A*+27B? 
3. If P is of order 2 then 


y(P)=0 and x(P)?+Az(P)+B=0 
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210.7 Selmer group 


Given an elliptic curve) Æ we can define two very interesting and important (groups, the 
Selmer group and the Tate-Shafarevich group, which together provide a{measure)of the 


failure of the for elliptic curves, by measuring whether the{curvelis everywhere 
Here we present the construction of these groups. 


Let E,E’ be elliptic curves defined over Q and let Q be an [algebraic closure) of Q. Let 
@: E — E’' be an non-constant (for example, we can let E = E’ and think of ¢ as 


being the “multiplication by n” map, [n]: E — E). The following standard result asserts 
that @ is|surjective] over Q: 


Theorem 24. Let C1, Co be curves defined over an [algebraically closed) field K and let 
yp: Ci > Cy 


be a morphism (or algebraic map) of curves. Then w is either\constant or surjective. 


s ee [4], Chapter I1.6.8. 


Since ¢: E(Q) — E’(Q) is non-constant, it must be surjective and we obtain the following 


exact Sequence 


0 > E(Q)d] > EQ) > P'(Q)=>0 (1) 


where E(Q)[¢] = Ker¢. Let G = Gal(Q/Q), the absolute Galois group) of Q, and consider 
the i‘"-cohomology group H’(G, E(Q)) (we abbreviate by H‘(G, E)). Using equation (1) we 


obtain the following long exact sequence (see 1 in [group cohomology): 
0— H(G, E(Q)¢]) = H(G, E) > H°(G, EF) =H (G, E(Q[¢]) — H" (G, E) = HG, E’) 


Note that E 7 
H? (G, E(Q)[¢]) = (EQ)? = Ed 
and similarly 


H”(G,E) = EQ), H°(G,E')=E(Q 


From (2) we can obtain an exact sequence: 


0 > E(Q)/¢(EQ) > H+ (G, EQ|¢]) > HG, E)le] > 0 


We could repeat the same procedure but this time for Æ, E’ defined over Q,,for some 
p, and obtain a similar exact sequence but with coefficients in Q, which 
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(2) 


relates to the original in the following {commutative diagram] (here G, = Gal(Q,/Q,)): 


0 > E'(Q/¢E(Q) > AG, EQ) — H(G, E)le] > 0 
| | l 
0 > E(Q)/HEQ)) > H'(Gp, E(Q)[9]) > H'(Gp, E)lo] > 0 


The goal here is to find alfinitel group containing E’(Q)/¢(E(Q)). Unfortunately H+(G, E(Q)[¢]) 
is not necessarily finite. With this purpose in mind, we define the ¢-Selmer group: 


S*(E/Q) = Ker (me E(Q)[4)) > |] #(G,, B) 


Equivalently, the ¢-Selmer group is the set of elements y of H'(G, E(Q)[ġ]) whose [image] 


Yp in H! (Gp, E(Q,)[¢]) comes from some element in E(Q,). 


Finally, by imitation of the definition of the Selmer group, we define the Tate-Shafarevich 
group: 
TS(E/Q) = Ker ac E)>|[#'G,, B) 


P 


The Tate-Shafarevich group is precisely the group that measures the Hasse principle in the 
elliptic curve F. It is unknown if this group is finite. 
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210.8 bad reduction 


210.8.1 Singular Cubic Curves 


Let E be a cubic (curve) over a [field K with Weierstrass equation f(x,y) = 0, where: 


3 


f(x,y) = y? + azy + azy — 2° — ayn? — ax — ag 
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which has a P = (£o, Yo). This is to: 
Of /Ox(P) = Af/dy(P) = 0 
and so we can write the Taylor expansion of f(x,y) at (£o, yo) as follows: 


f(x,y) — f (x0, yo) = Alz — to)? + A(x — to) ly — yo) + As(y — Yo)? — (x — qto)? 
a (y = Yo) = a(x = ro) |[(y = Yo) = B(x aad zo)] = (x E zo)? 


for some \; € K and a, 8 € K (anfalgebraic closureļof K). 
Definition 16. The singular point P is a node if a ¥ 8. In this case there are two different 
[tangent] lines to Æ at P, namely: 

Y — Yo = a(z — zo), Y — Yo = p(T — Zo) 


If a = p then we say that P is a cusp, and there is a unique tangent line at P. 


Note: See the entry for elliptic curve for examples of cusps and nodes. 


There is a very simple criterion to know whether a cubic curve in Weierstrass form is singular 
and to differentiate nodes from cusps: 


Proposition 10. Let E/K be given by a Weierstrass equation, and let A be theldiscriminant 
and c4 as in the definition of A. Then: 


1. E is singular if and only if A = 0, 
2. E has a node if and only if A = 0 and c4 £0, 
3. E has a cusp if and only if A = 0 = c4. 


S ee [2], chapter III, 1.4, page 50. 


210.8.2 Reduction of Elliptic Curves 


Let E/Q be an elliptic curve (we could work over any number field! K, but we choose Q for 
simplicity in the exposition). Assume that Æ has a Weierstrass equation: 


y? + aı£y + azy = xr? + ar? + a4Xx + ag 


with coefficients in Z. Let p be alprime]in. Z. By reducing each of the coefficients a; modulo p 
we obtain the equation of a cubic curve E over the |finite field! F, (the field with p elements). 
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Definition 17. 


1. If Bisa non-singular curve then E is an elliptic curve over F, and we say that Æ has 
good reduction at p. Otherwise, we say that EF has bad reduction at p. 


2. If E has a cusp then we say that E has additive reduction at p. 


3. If Ē has a node then we say that EF has multiplicative reduction at p. If the slopes 
of the tangent lines (a and 8 as above) are in F, then the reduction is said to be split 


(and non-split otherwise). 


From Proposition 1 we deduce the following: 


Corollary 4. Let E/Q be an elliptic curve with coefficients in Z. Let p € Z be a prime. If 
E has bad reduction at p then p | A. 


Examples: 


1. Ey: y? = xr? + 352 +5 has good reduction at p = 7. 


2. However E; has bad reduction at p = 5, and the reduction is additive) (since modulo 5 
we can write the equation as [(y — 0) — 0(# — 0)|? — x? and the slope is 0). 


3. The elliptic curve Ez: y? = z3 — x? +35 has bad multiplicative reduction at 5 and 7. 
The reduction at 5 is split, while the reduction at 7 is non-split. Indeed, modulo 5 we 
could write the equation as [(y—0) — 2(x —0)]|(y — 0) + 2(a — 0)] — 2°, being the slopes 
2 and —2. However, for p = 7 the slopes are not in Fy (/—1 is not in F7). 
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210.9 conductor of an elliptic curve 


Let E be an [elliptic curve| over Q. For each |prime|p € Z define the quantity fp as follows: 


0, if E has good reduction at p, 

1, if E has multiplicative reduction at p, 

2, if E has additive reduction at p, and p Æ 2,3, 
2+ dp, if E has additive reduction at p = 2 or 3. 


ig= 


where 6, depends on wild [ramification] in the action of the [inertia group) at p of Gal(Q/Q) 
on the [Tate module T,(£). 


Definition 18. The conductor Ng/g of E/Q is defined to be: 


Neyo = [[»” 


P 


where the product is over all primes and the [exponent] f, is defined as above. 


Example: 


Let E/Q: y? +y = 2? — x? + 2x — 2. The primes of [bad reduction] for E are p = 5 and 7. 


The reduction at p = 5 is while the reduction at p = 7 is Hence 
Np = 175. 
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210.10 elliptic curve 


210.10.1 Basics 


An elliptic curve over a field) K is a projective curve] E over K of genus} 
1 together with a point O of E defined over K. The word ”genus” is taken here in the 
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Figure 210.1: Graph of y? = x(x — 1)(x + 1) 
Figure 210.2: Graph of y? = z? — x + 1 


algebraic sense, and has no [relation] with the topological notion of genus (defined 
as 1 — x/2, where x is the Euler characteristic) except when the field of definition K is the 


C. 
Using the Riemann-Roch theorem) one can show that every elliptic curve E is the 


of a Weierstrass equation of the form 
E : y? + azy + azy = T? + agoz? + ager + ag, 


for some a; € K, where the|[polynomialļon the right hand side has no double [roots When K 
has [characteristicl other than 2 or 3, one can further simpify this Weierstrass equation into 
the form 

E: y? = 2° — 27c4£ — 54C6. 


The extremely strange numbering of the coefficients is an artifact of the process by which the 
above equations are derived. Also, note that these equation are for affine curves; to translate 
them to projective curves); one has to homogenize the equations (replace x with X/Z, and y 
with Y/Z). 


210.10.2 Examples 


We present here some pictures of elliptic curves over the field R of These 
pictures are in some sense not representative of most of the elliptic curves that people work 


with, since many of the interesting cases tend to be of elliptic curves over algebraically closed 


fields. However, curves over the complex numbers (or, even worse, over algebraically closed 


fields in characteristic p) are very difficult to |graph) in three let alone two. 


Figure 210_lJis a graph of the elliptic curve y? = x? — x. 
Figure 210.2] shows the graph of y? = z? — x + 1: 


Finally, Figures 210.3)]and2I0-4Jare examples of algebraic curves that are not elliptic curves. 
Both of these curves have singularities at the origin. 


Figure 210.3: Graph of y? = z?(x + 1). Has two tangents at the origin. 
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Figure 210.4: Graph of y? = x°. Has a cusp at the origin. 


210.10.3 The Group Law 


The points on an elliptic curve have a natural/groupjstructure| e| which makes the elliptic curve 
into an |abelian| variety. There are es reac eee fo define this group structure; two 


of the most common are: 


e Every Weyl divisor on Æ is linearly equivalent to a unique divisor of the form [P] — [O] 
for some P € E, where O € E is the [base] point. The divisor [class] group of E then 
yields a group structure on the points of E, by way of this correspondence. 


e Let O € E denote the base point. Then one can show that every line joining two points 
on E intersects a unique third point of E (after properly accounting for lines 
as a multiple intersection). For any two points P,Q € E, define their sum as: 


1. Form the line between P and Q; let R be the third point on E that intersects this 
line; 
2. Form the line between O and R; define P + Q to be the third point on E that 


intersects this line. 


This addition operation yields a group operation on the points of F having the base 


point O for identity 


210.10.4 Elliptic Curves over C 


Over the complex numbers, the general correspondence between algebraic and/analytic|theory| 
specializes in the elliptic curves case to yield some very useful insights into the structure of 
elliptic curves over C. The starting point for this investigation is the Weierstrass p—function, 
which we define here. 

Definition 10. A Jatticelin C is a subgroup} L of the additive group C which is generated 
by two elements w1, w2 € C that are linearly independent] over R. 


Definition 11. For any lattice L in C, the Weierstrass pr -function of L is the 


pz : C — C given by 
ETES = 
px (2) i= =e we Je 


wEL\{0} 


When the lattice L is clear from context, it is customary to suppress it from the notation 
and simply write p for the Weierstrass p-function. 


of the Weierstrass p—function: 
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e p(z) is a[meromorphic] function with double poles) at points in L. 
e p(z) is (constant) on each \coset] of C/L. 
+ p(:) Katisfies| the [differential equation 
p'(z) = 4p(z)° — gop(z) — gs 
where the constants g2 and g3 are given by 


o= 60 4 


wEL\{0} 


uo 


we L\{0} 
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The last property above implies that, for any z € C/L, the point (p(z), p'(z)) lies on the 
elliptic curve E : y? = 4r? — gox — g3. Let 6: C/L — E be the map given by 


de) = ae zL 


oo zEeL 


(where co denotes the point at on E). Then @¢ is actually a (!), and 


moreover the map ¢: C/L — E is an jisomorphism| of Riemann surfaces as well as a group 
isomorphism (with the addition operation on C/L inherited from C, and the elliptic curve 


group operation on E). 


We can go even further: it turns out that every elliptic curve Æ over C can be obtained in 
this way from some lattice L. More precisely, the following is true: 


Theorem 16. 1. For every elliptic curve E : y? = 4x° — bx —c over C, there is a unique 
lattice L C C whose constants go and g3 satisfy b = go and c = g3. 


2. Two elliptic curves E and E" over C are isomorphic if and only if their corresponding 
lattices L and L’ satisfy the equation L' = aL for somelscalal a € C. 
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210.11 height function 


Definition 19. Let A be an/abelian group, A height function on A is a [function h: A > R 
with the [properties 
1. For all Q € A there exists a [constant] C,, depending on A and Q, such that for all 
PEA: 
h(P +Q) <2h(P)+ C 
2. There exists an [integer] m > 2 and a constant C2, depending on A, such that for all 
PEA: 


h(mP) > m?h(P) — C2 


. For all C3 € R, the following set is finite! 


{P € A:h(P) < C3} 


Examples: 


1. 


2. 


For t = p/q € Q, a [fractionl in lower terms|, define H(t) = max{| p |,| q |}. Even 
though this is not a height function as defined above, this is the prototype of what a 


height function should look like. 


Let E be an over Q. The function on E£(Q), the points in FE with 
coordinates! in Q, hy: E(Q) > R : 


_ flog H(x(P)), if P#0 
he(P) = { 0. 47 Pei l 


is a height function (H is defined as above). Notice that this depends on the chosen 
Weierstrass (model of the 


. The canonical height of E/Q (due to Neron and Tate) is defined by: 


ho(P) = 1/2 lim 4h, (2%) P) 


where h,, is defined as in (2). 


Finally we mention the fundamental theorem of “descent”, which highlights the importance 
of the height functions: 


Theorem 25 (Descent). Let A be an abelian group and let h: A — R be a height function. 
Suppose that for the integer m, as in property (2) of height, the quotient group A/mA is 
finite. Then A is \finitely generated 
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210.12 j-invariant 


Let E be an elliptic curve over Q with Weierstrass equation: 
y? + axy + azy = r3 + asr? + aax + Ag 
with coefficients a; € Q. Let: 


bə = a? + 4a9, 


b4 = 2a, + 4103, 

bs = a3 + dag, 

bs = asag + 4aoag — a,a3a4 + az — a$, 
c4 = b — 24b4, 

Ce = —b3 + 36b2b4 — 216b 


Definition 20. 
1. The discriminant of E is defined to be 
A = —bŽbg — 8b3 — 27b% + 9bzb4bg 


2. The j-invariant of E is 


3. The invariant differential is 
dx dy 
Ww = -hCC IMaaaaaaaaaaaauauaasusussssssssssssssssslŘÃħŮĂ 
2y +aız+az3 322+ 2asr + a4 — ayy 


Example: 


If E has a Weierstrass equation in the simplified form y? = 73 + Ax + B then 


_1728(4A) 


A = —16(4A? + 27B?), is 
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210.13 rank of an elliptic curve 


Let K be a number field and let E be an elliptic curve over K. By E(K) we denote the set 
of points in E with [coordinateslin K. 


Theorem 26 (Mordell-Weil). E(K) is alfinitely generated) abelian group, 


T he proof of this theorem is fairly involved. The main two ingredients are the so called 
“weak Mordell-Weil theorem” (see below), the concept of height function] for abelian groups 
and the “descent” theorem. 

See [2], Chapter VIII, page 189. 


Theorem 27 (Weak Mordell-Weil). E(K)/mE(K) is (finite for all m > 


The Mordell-Weil theorem] implies that for any elliptic curve E /K the (group) of points has 
the following [structurel 
E(K) = Erorsin( K) P Z” 


where Frorsin( K) denotes the set of points of finite (or torsion group), and R is a 
non-negative [integer] which is called the rank of the elliptic curve. It is not known how big 
this number R can get for elliptic curves over Q. The largest rank known for an elliptic curve 


over Q is 24Martin-McMillen (2000) 
Note: see [Mazur’s theorem] for an account of the possible over Q. 


Examples: 


1. The elliptic curve F,/Q: y? = z? + 6 has rank 0 and Æ (Q) ~ 0. 


2. Let E2/Q: y? = z? +1, then E,(Q) ~ Z/6Z. The torsion group is generated by the 


point (2, 3). 
3. Let E3/Q: y? = x? + 109858299531561, then F3(Q) ~ Z/3Z@Z°. See 
here. 


4. Let E,/Q: y2+1951/164ry — 3222367 /40344y = 23+3537/164x? — 40302641/121032z, 
then £,(Q) ~ Z". See |generators| here. 
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210.14 supersingular 


An elliptic curve| E over a field of characteristic p defined by the|cubic equation) f(w, x,y) = 0 


is called supersingular if the coefficient of (wxy)?~! in f(w,2x,y)?~* is zero. 


A supersingular elliptic curve is said to have Hasse invariant) 0; an ordinary (i.e. non- 
supersingular) elliptic curve is said to have Hasse invariant 1. 


This is to many other conditions. E is supersingular iff the [invariant differential] 
is exact. Also, Æ is supersingular iff F* : H'(E,0O,) — H!(E, Op) is nonzero where F* is 


induced] from the Frobenius morphism] F : E > E. 
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210.15 the torsion subgroup of an elliptic curve injects 
in the reduction of the curve 


Let E be an elliptic curve| defined over Q and let p € Z be a Assume FE has a 
Weierstrass equation of the form: 
y? + axy + azy = r’ + azr? + ast + ag 


with coefficients a; € Z. Let E be reduction of E modulo p (see bad reduction) which is a 
curve defined over F, = Z/pZ. We have amap] (the reduction map) 


Tp: E(Q) > E(F,) 


Tp(P) = Tp([£0, Yo, Z0]) = [vo mod p, yo mod p, zo mod p| = P 


Recall that E might be a curve at some points. We denote Ens( F) the set of 
non-singular points of Æ. We also define 


Eo(Q) = {P € E(Q) | m (P) = Pe Ens Fp) 


E: (Q) = {P € E(Q) | 2,(P) = P = O} = Ker(z,) 
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Proposition 11. There is an\exact sequence of\abelian groups 
0 — E (Q) — Fo(Q) — F Fp) — 0 


where the right-hand side map is Tp to Eo(Q). 


Notation: Given a{group|G, we denote by G[m] the m-torsion of G, i.e. the points of [order] 


m. 


Proposition 12. Let E/Q be an elliptic curve (as above) and letm be a positive [integer] 
such that gcd(p,m) = 1. Then: 


E (Q) [m] = {0} 


2. If E(F,) is a non-singular curve, then the reduction map, restricted to E(Q)(m], is 
This is _ 
E(Q)|[m] — E(F,) 


is injective. 


Remark: 2 of the is quite useful when trying to compute the 
of ÆE/Q. Note that this can be reinterpreted as follows: for all primes p which do 


not idividel m, E(Q)[m] — E(F,) must be injective and therefore the number of m-torsion 
points divides the number of points defined over F,. 


Example: 
Let E/Q be given by 
y=2r+3 


The [discriminant] of this curve is A = —3888 = —243°. Recall that if p is a prime of bad 
reduction, then p | A. Thus the only primes of bad reduction are 2,3, so Æ is non-singular 
for all p > 5. 


Let p = 5 and consider the reduction of Æ modulo 5, E. Then we have 
E(Z/5Z) = {0, (1, 2), (1, 3), (2, 1), (2, 4), (3, 0)} 


where all the are to be considered modulo 5 (remember the point at linfinity}). 
Hence N; =| E(Z/5Z) |= 6. Similarly, we can prove that N7 = 13. 


Now let q #5,7 be alprime number| Then we claim that £(Q)[q] is trivial. Indeed, by the 


remark above we have 


| E(Q)|q] | divides N; = 6, N7 = 13 
so | E(Q){q] | must be 1. 
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For the case q = 5 be know that | E(Q)[5] | divides Ny = 13. But it is easy to see that 
if E(Q)|p] is non-trivial, then p divides its order. Since 5 does not divide 13, we conclude 
that E(Q)[5] must be trivial. Similarly Æ(Q)[7] is trivial as well. Therefore Æ(Q) has trivial 


torsion subgroup. 


Notice that (1,2) € E(Q) is an obvious point in the curve. Since we have proved that there 
is no non-trivial torsion, this point must be of In fact 


and the group is generated by (1, 2). 
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Chapter 211 


14H99 — Miscellaneous 


211.1 Riemann-Roch theorem 


Let C be a projective over an algebraically closed If Disa 


on C, then 


D) -K — D) = deg(D)+1 -g 


where g is the [genus] of the curve, and K is the canonical divisor (L(K) = g). 
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211.2 genus 


“Genus” has number of distinct but compatible definitions. 


In topology, if S is an orientable surface, its genus g(S) is the number of “handles” it has. 
More precisely, from the classification of surfaces, we know that any orientable surface is a 
or the\connected sumlof n tori. We say the sphere has genus 0, and that the connected 
sum of n tori has genus n (alternatively, genus is with respect to connected sum, 
and the genus of a{toruslis 1). Also, g(S) = 1—y(S)/2 where x(S) is the Euler characteristic] 
of S. 


In the genus of a [smooth] {projective curve) X over a field] k is the 
over k of the Q1(X) of global differentials on X. Recall 
that a smooth is also a Riemann surface, and hence topologically a surface. 
In this case, the two definitions of genus coincide. 
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211.3 projective curve 


A projective curve over a field k is an equidimensional [projective variety] over k of|dimension 
1. In other words, each of the irreducible components) of this variety must have dimension 1. 
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211.4 proof of Riemann-Roch theorem 


For a [divisor] D, let £D be the associated {line bundle! By [Serre duality, H°(£K — D) = 
H'(£D), so €(D)—0(K —D) = x(D), the Euler characteristic of £D. Now, let p be a point of 
C, and consider the divisors D and D+ p. There is a\naturallinjection| £D — £D + p. This 
is an [isomorphism] anywhere away from p, so the quotient € is a skyscraper |sheaf| supported 
at p. Since skyscraper sheaves are flasque, they have trivial higher and so 
x(€) = 1. Since Euler characteristics add along [exact sequences] (because of the long exact 
sequence in cohomology) x(D +p) = x(D) + 1. Since deg(D +p) = deg(D) +1, we see that 
if Riemann-Rock holds for D, it holds for D+p, and vice-versa. Now, we need only confirm 
that the theorem holds for a single line bundle. Ox is a line bundle of 0. €(0) =1 
and ¢() = g. Thus, Riemann-Roch holds here, and thus for all line bundles. 
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Chapter 212 


14L17 — Affine algebraic groups, 
hyperalgebra constructions 


212.1 affine algebraic group 


An affine algebraic group over a [field] k is quasi-affine variety G (a locally Iclosed||subset! of 
over k, which is a equipped with a{groupjstructure such that the multiplication 
mapjm : G x G — G and inverse] map i : G > G are algebraic 


For example, k is an affine algebraic group over itself with the group law being addition, 
and as is k* = k — {0} with the group law multiplication. Other common examples of affine 


algebraic groups are GL,,k, the|general linear group) over k (identifying [matrices] with affine 
space) and any [algebraic torus] over k. 
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212.2 algebraic torus 


Let k be a field) Then k*, the multiplicative group) of k is an affine algebraic group) over k. 


An affine algebraic group of the form (k*)” is called an algebraic torus over k. 


Thefnameéjis connected to the fact that if k = C, then an algebraic torus is the|complexification| 
of the standard cow 
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Chapter 213 


14M05 — Varieties defined by ring 
conditions (factorial, 
Cohen-Macaulay, seminormal) 


213.1 normal 


Let X be a variety or X is said to be normal at a point p € X if the local ring} 
O, is integrally closed, X is said to be normal if it is normal at every point. If X is 
non-singular at p, it is normal at p, since regular local rings are integrally closed. 
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Chapter 214 


14M15 — Grassmannians, Schubert 
varieties, flag manifolds 


214.1 Borel-Bott-Weil theorem 


Let G be a |semisimple|/Lie group] and A be an [integral weight] for that A naturally 
defines a one-dimensional representation] C) of the Borel subgroup} B of G, by simply pulling 
back the representation on the maximal torus] T = B/U where U is the unipotent 
of G. Since we can think of the [projection map| 7 : G — G/B as a principle B-bundle, to 
each Ch, we get an associated £, on G/B, which is a {line bundle) 
Identifying £) with its of we consider the 
groups H*(£)). Realizing g, the|Lie algebra| of G, as[vector fields|on G/B, we see that g acts 
on the sections of £) over any [open set, and so we get an action on [cohomology] groups. This 


integrates to an action of G, which on H°(£)) is simply the obvious action of the group. 


The Borel-Bott-Weil theorem states] the following: if (A + p,a) = 0 for any [simple root] a of 
g, then l 
H'(£&)=0 


for all i, where p is half the sum of all the Otherwise, let w € W, the 


‘Weyl groupļof G, be the unique element such that w(A+p) is\dominant| (i.e. (w(A+p), a) > 0 
for all simple roots a). Then 


HOL) = Va 


where V; is the unique irreducible! representation of highest weight), and H*(£)) = 0 for all 
other i. In particular, if À is already dominant, then I'(£)) S Vy, and the higher cohomology 


of £, vanishes. 


If À is dominant, than £, is generated by [global sections, and thus determines a [map] 
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This map is an obvious one, which takes the|coset|of B to the highest weight \vector]| vp of Vy. 
This can be extended by equivariance since B{fixes|vp. This provides an alternate description 
of Ly. 


For example, consider G = SL2C. G/B is CP!, the Riemann sphere) and an integral weight 
is specified simply by an [integer] n, and p = 1. The line bundle £,, is simply O(n), whose 
sections are the of n. This gives us in one stroke the 
representation theory] of SLC: ['(O(1)) is the standard representation, and ['(O(n)) is its 


nth We even have a unified decription of the action of the Lie algebra, 
derived from its realization| as vector fields on CP!: if H,X,Y are the standard [generators] 
of sloC, then 
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214.2 flag variety 


Let k be a field, and let V be a|vector space] over k of dimension! n and choose an increasing 
sequence] i = (71,...,%), with 1 S i1 <---<%, <n. Then the (partial) flag variety FE(V, i) 
associated to this data is the set of all flags) {Oo} < Vi <--- < Va with dim V; = ij. This 


has a natural embedding} into the product of Grassmannians G(V, i1) x ---G(V,in), and its 
[image] here is making ¥¢(V,i) into a [projective variety] over k. If k = C these are 
often called flag manifolds. 


The|group|Sl(V) acts transtively on F¢(V, i), and the|stabilizer|of a point is alparabolic subgroup 
Thus, as a Fl(V,i) = SI(V)/P where P is a parabolic subgroup of 
SI(V). In particular, the complete flag variety is to SI(V)/B, where B is the 
Borel subgroup 
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Chapter 215 


14R15 — Jacobian problem 


215.1 Jacobian conjecture 


Let F: C” — C” be a polynomial|map} i.e., 
F(£1,..., En) = (fili, <., En) -3 Jall,- -,En)) 
for certain polynomials f; € C[X1,..., Xn]. 
If F is invertible, then its [Jacobi determinant] det (ð f;/ðx;), which is a polynomial over C, 


vanishes nowhere and hence must be a non-zero [constan 


The Jacobian conjecture asserts the converse: every polynomial map C” — C” whose Jacobi 
determinant is a non-zero constant is invertible. 
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Chapter 216 


15-00 — General reference works 
(handbooks, dictionaries, 
bibliographies, etc.) 


216.1 Cholesky decomposition 


A and can be efficiently decomposed into a lower and 


upper triangular matrix) For a matrix of any type} this is achieved by the LU decomposition 
which factorizes A = LU. If A'satisfies|the above criteria, one can decompose more efficiently 


into A = LLT, where L (which can be seen as the “matrix square root” of A) is a lower 
triangular matrix with positive diagonal elements. L is called the Cholesky triangle. 


To solve Ax = b, one solves first Ly = b for y, and then Lg = y for z. 


A variant of the Cholesky decomposition is the form A = RTR , where R is upper triangular. 


Cholesky decomposition is often used to solve the normal equations] in linear least squares| 


problems; they give A? Ax = A’b , in which ATA is symmetric and positive definite. 


To derive A = LL’, we simply equate coefficients on both sides of the equation: 


a12 


Qin 
l lha 0 0 lı lz lni 
n nt fla dla o Offo be In2 
a31 432 A3n} = |. : . . : 
: : : s 0 : : 0 
Ani p2 ann lni ln2 ~~ lan 0 0 lin 
Solving for the unknowns (the nonzero l;;s), for i =7,---,n and j =i+1,...,n, we get: 
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i—1 
E 
k=1 


i-1 
ji (o = ` ita fla 
k=1 


~ 


Because A is symmetric and positive definite, the expression under the|square rootļis always 
positive, and all ļ;; are [real] 


References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) 
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216.2 Hadamard matrix 


An n x n matrix H = hij is an Hadamard matrix of lorder| n if the entries of H are either 
+1 or —1 and such that HH? = nI, where HT is the |transpose] of H and J is the order n 


identity matrix 


In other words, an n x n matrix with only +1 and —1 as its elements is Hadamard if the 


of two distinct rows is 0 and the inner product of a row with itself is n. 


A few examples of hadamard matrices are 


—1 1 1 1) f1 1 1 1 
1 1] |2 -1 1 #2] fi -1 1 «-1 
F 4): 1 2 -1 1/?]1 2 -1 -1 
1 2 1 =i [2 -1 -1 1 


These matrices were first considered as Hadamard determinants, because the determinant 
of an Hadamard matrix satisfies) equality in Hadamard’s determinant theorem, which [states] 
that if X = x;; is a matrix of order n where |x;;| < 1 for all i and j, then 


det(X) < n”? 


property) 1: 
The order of an Hadamard matrix is 1,2 or 4n, where n is an [integer] 
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property 2: 

If the rows and columns of an Hadamard matrix are permuted, the matrix remains Hadamard. 
property 3: 

If any row or column is multiplied by —1, the Hadamard property is retained. 


Hence it is always possible to arrange to have the first row and first column of an Hadamard 
matrix contain only +1 entries. An Hadamard matrix in this form is said to be normalized. 


Hadamard matrices are common in signal processing and coding applications. 
Version: 3 Owner: giri Author(s): giri 


216.3 Hessenberg matrix 


An upper Hessenberg matrix is of the form 


Q11 Q12 Q13 *** GAln-1 Gin 
Q21 Q22 Q23 *** QAgn-1 Gn 
0 a32 a33 `’ Q3n-1 G3n 


0 0 ag > A4 n—-1 An 


0 0 0 O Ann- am 


Q11 a12 0 0 0 

a21 a22 a23 0 0 
An—2,1 An—2,2 Qn-2,3 *°* Qn—2n-1 0 
Qn—-1,1 Qn-1,2 Qn-1,3 **° Gn-1yn-1 On—-I1,n 

an,1 On,2 On,3 nines An, n—1 Ann 
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216.4 IfA «€ M,(k) and A is supertriangular then A” = 0 


theorem: Let A be a/square matrix] of dimension) n over a k and A is supertriangular 
then A” = 0. 


proof: Find the |characteristic polynomial of A by computing the \determinant] of tI — A. 
The square matrix tl — Aisa The determinant of a a triagular [matrix] is 


the product of the diagonal element of the matrix. Therefore the characteristic polynomial is 


p(t) = t” and by the Cayley-Hamilton theorem] the matrix A [satisfies] the polynomial} That 
is A” = 0. 


Version: 5 Owner: Daume Author(s): Daume 


216.5 Jacobi determinant 


Let 


be alfunction| of n variables, and let 


u = u(x) = (0G) cc Unl£)) 


be a function of x, where inversely x can be expressed as a function of u, 


v= alu) = (iW) st 2n yl H)) 


The formula for a change of variable in an n-dimensional integral is then 


into f(x)d"x = intywayf(x(u))| det(dx/du)|d"u 


Q is an integration [region] and one integrates over all x € Q, or equivalently, all u € u(Q). 
dx/du = (du/dx)~ is the Jacobi matrix] and 


| det(dax/du)| = | det(du/dx)|~" 
is the absolute value of the Jacobi determinant or Jacobian. 
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As an example, take n = 2 and 


Q = {(1, £2)|0 < zı L 1,0 < t2 < 1} 


Define 


p=V/-2log(a1) y =2rr 
uy = pcos p uz = psiny 


Then by the[chain rule and definition of the Jacobi matrix, 


du/dx (u, U2) /O(X1, £2) 


(O(ur, u2)/O(p, p) (Olp, Y)/A(a1, 2) 


_ [cosy —psiny\ /—1/pxı 0 
-© \sing pcos 0 2¢ 


The Jacobi determinant is 


det(du/dx) = det{O(uj, u2)/O(p, y)} det{O(p, y) /O(ax1, r2)} 
p(—21/px,) = —27/ Ti 


and 


dx | det (dx/du)|d?u = | det(du/dx)|~'d?u 


= (21/27) = (1/27) exp(—(u? + u3/2))@u 


This shows that if zı and x2 are independent||random variables! with 


References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) 
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216.6 Jacobi’s Theorem 


Jacobi’s Theorem If A is a/skew-symmetric matrix) of odd [dimension] then det A = 0. 


Proof. Suppose A is an n x n [square matrix] For the we then have det A = 
det AT, and det(—A) = (—1)” det A. Thus, since n is odd, and AT = —A, we have det A = 


— det A, and the theorem follows. 


Remarks 


1. According to [I], this theorem was given by Carl Gustav Jacob Jacobi (1804-1851) [2] 
in 1827. 


2. The 2 x 2 ( ae r ) shows that Jacobi’s theorem does not hold for 2 x 2 


matrices. The determinant of the 2n x 2n [block matrix! with these 2 x 2 matrices on 
the diagonal equals (—1)”. Thus Jacobi’s theorem does not hold for matrices of even 
dimension. 


REFERENCES 


1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 
2. The MacTutor History of Mathematics archive, 
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216.7 Kronecker product 
Definition. Let A be a n x n\matrix| (with entries a;;) and let B be am x m matrix. Then 
the Kronecker product of A and B is the mn x mn block matrix| 


au B i in B 
A®B = : E : 
anı B eA ann B 


The Kronecker product is also known as the direct product or the tensor product [I]. 
Fundamental ma 
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1. The product is|bilinear| If k is a|scalar, and A, B and C are|square matrices, such that 
B and C are of the same dimension] then 


A®(B+C) = A®B+A@C, 
(B+C)@A = BQA+CQA, 
k(A@B) = (kA)@B=A® (kB). 


2. If A,B,C,D are square matrices such that the products AC and BD exist, then 
(A ®@ B)(C ® D) exists and 


(A@B)(C@D) = AC@BD. 
If A and B are invertible matrices, then 
(A@B)' = At@B". 
3. If A and B are square matrices, then for the [transpose] (A7) we have 
(A@B)’ = A 8B". 
4. Let A and B be square matrices of dimensions n and m. If {A;|i = 1,...,n} are 


the eigenvalues) of A and {u;|j = 1,...,m} are the eigenvalues of B, then {A;u;li = 
1,...,n, 7 =1,...,m} are the eigenvalues of A & B. Also, 


det(A & B) (det A)” (det B)”, 
rank(A & B) rank A rank B, 
trace(( A & B) = traceA trace B, 


REFERENCES 
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 


2. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation, Prentice Hall, 2000 
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216.8 LU decomposition 


Any non-singular [matrix] A can be expressed as a product A = LU; there exists exactly one 
L and exactly one upper triangular matrix U of the form: 
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Q11 Q12 `? Ain 1 es 0 Uil Ui? tt Uin 
Q21 Q22 °**: Qm lor 1 > Gon 0 az > Un 


Ani An2 ‘** Ann lni ln ne 1 0 0 > Unn 


if exchanges (partial pivoting) are not necessary. With pivoting, we have to introduce a 
permutation matrix| P. Instead of A one then decomposes PA: 


PA= LU 


The LU decomposition can be performed in a way similar to (Gaussian elimination 


LU decomposition is useful, e.g. for the solution of the system of linear equations 
Ax = b, when there is more than one right-hand side b. With A = LU the system becomes 


LUx =b 


or 


Lc=band Uz =c 


c can be computed by forward substitution and x by back substitution. 
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216.9 Peetre’s inequality 


Theorem [Peetre’s inequality] [I] B] If t is areal numberland z, y arelvectorslin R”, then 


eau 


t 
i pP) Sara 


Proof. (Following [I].) Suppose b and c are vectors in R”. Then, from (|b| — |c|)? > 0, we 
obtain 


2|b] - Jel < Jbl? + lel’. 
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Using this and the (Cauchy-Schwarz inequality, we obtain 


1+ |b-c/? 


1+ |b)? — 2b -c+ |e]? 

1+ |b? + 2|b||e| + le” 
1+ 2|b|? + 2\c|? 

2(1 + |b|? + |e]? + ||? Ie|?) 
= 2(1+ b) + Ie’) 


IA 1A IA 


Let us define a = b — c. Then for any vectors a and b, we have 


1+ lal? 2 
< 2(1+ |a—)]*). 216.9.1 
ge S201 + lao) (216.9.1) 
Let us now return to the given inequality. If t = 0, the claim is trivially true for all x,y in 
R”. If t > 0, then raising both sides in inequality 216.9 Ito the power] of t, using t = |t|, and 
setting a = xz, b = y yields the result. On the other hand, if t < 0, then raising both sides 
in inequality 216.9.) to the power to —t, using —t = |t|, and setting a = y, b = x yields the 
result. 


REFERENCES 


1. J. Barros-Neta, An introduction to the theory of distributions, Marcel Dekker, Inc., 
1973. 

2. F. Treves, Introduction To Pseudodifferential and Fourier Integral Operators, Vol. I, 
Plenum Press, 1980. 
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216.10 Schur decomposition 


If A is a complex|/square matrix] of dimention n (i.e. A € Mat,(C)), then there exists a 
Q € Matn (C) such that 


QUAQ=T=D+N 


where “ is the/conjugate transpose, D = diag(A1,...,An) (the à; areleigenvalues of A), and 
N € Mat,,(C) is strictly upper triangular matrix, Furthermore, Q can be chosen such that 
the eigenvalues À; appear in any order] along the diagonal. 
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REFERENCES 


[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns 
Hopkins University Press, London, 1996. 
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216.11 antipodal 


Definition Suppose x and y are points on the n-sphere| S”. If x = —y then z and y are called 
antipodal points. The antipodal map is the [map] A : S” — S” defined as A(x) = —2. 


Properties 


1. The antipodal mapA : S” — S” is homotopic] to the identity map|if n is odd [I]. 
2. The degree] of the antipodal map is (—1)"*". 


REFERENCES 


1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974. 
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216.12 conjugate transpose 


Definition If A is a then the conjugate transpose A* is the matrix 
A* = AT, where A is the complex conjugate|of A, and AT is the |/transpose| of A. 


It is clear that for real] matrices, the conjugate transpose coincides with the transpose. 


Properties 


1. If A and B are complex matrices of same [dimension] and a, 8 are complex [constants] 
then 


(aA+6B)* = @A*+ GB", 
A™ = A, 
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2. If A and B are complex matrices such that AB is defined, then 
(ABY = B*A*. 


3. If A is a complex [square matrix, then 


det(A*) = det A, 
trace(A*) = trace A, 
(A*\t = (A1)*, 


where trace and det are the|trace| and the|determinant operators, and ~! is the inverse 


operator. 


4. Suppose (-,-) is the standard on C”. Then for an arbitrary complex 
n x n matrix A, and |vectors| x,y € C”, we have 


(Az, y) = (x, A*y). 


Notes 


The conjugate transpose of A is also called the adjoint matrix of A, the Hermitian 
conjugate of A (whence one usually writes A* = AĦ). The notation A! is also used for the 
conjugate transpose [2]. In [I], A* is also called the tranjugate of A. 


REFERENCES 


1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 
2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965. 


See also 
e Wikipedia, conjugate transpose 
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216.13 corollary of Schur decomposition 


theorem:A € C”*” is anormal matrix if and only if there exists a Qecr 
such that Q? AQ = diag(A1,..., An) (the diagonal matris) where ¥ is the\conjugate transpose 
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proof: Firstly we show that if there exists a unitary matrix Q € C"*” such that Q? AQ = 
diag(A1,..-,An) then A € C”*” is a normal matrix. Let D = diag(A1,...,A,) then A may 
be written as A = QDQ". Verifying that A is normal follows by the following observation 
AA? = ODO" OD" QO" = GDD OU? and A" A=] OD" QY O00" = OD" DO". There 
fore A is normal matrix because DD” = diag(\,Aq,-..,AnAn) = DË D. 

Secondly we show that if A € C”*” is a normal matrix then there exists a unitary ma- 
trix Q € C”*” such that Q¥ AQ = diag(Ay,...,An). By Schur decompostion we know 


that there exists a Q € C"*” such that Q¥AQ = T(T is an upper triangular matria). 


Since A is a normal matrix then T is also a normal matrix. The result that T is a di- 
agonal matrix comes from showing that a normal upper triagular is diagonal (see 


theorem for normal triangular matrices). 


REFERENCES 


[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns 
Hopkins University Press, London, 1996. 
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216.14 covector 


If V is alvector space] over alfieldlk, then a covector is allinear map|a : V —> k, that is, and el- 
ement of the|dual spaceļto V. Thus, for example, a covector field on aldifferentiable manifold] 
is a synonym for al|l-form] 
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216.15 diagonal matrix 
Definition [IB] Let A be a|square matrix|(with entries in any field). If allloff-diagonal entries 


of A are zero, then A is a diagonal matrix. 


From the definition, we see that an n x n diagonal matrix is completely determined by the 
n entries on the diagonal; all other entries are zero. If the diagonal entries are a1, d2,...,Qn, 
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then we denote the corresponding diagonal matrix by 


a 0 0 >. O 
0 a 0 0 
diag(ai,...,an) =| 0 0 4a 0 
0 0 0 an 


Examples 


1. The identity matrix) and zero [matrix] are diagonal matrices. Also, any 1 x 1 matrix is 


a diagonal matrix. 


2. A matrix A is a diagonal matrix if and only if A is both an upper and 


Properties 


1. If A and B are diagonal matrices of same then A+ B and AB are again a 
diagonal matrix. Further, diagonal matrices commute, i.e., AB = BA. It follows that 


(and (complex) diagonal matrices are {normal matrices} 
2. A square matrix is diagonal if and only if it is triangular and normal (see this page). 


3. The [eigenvalues] of a diagonal matrix A = diag(a1,..., an) are @1,...,@pn. In conse- 
quence, for the determinant) we have det A = aja2--+dn, so A is invertible if and only 
if all a; are non-zero. Then the linversel is given by 


(diag(a1,..., an)) = diag(1/a,...,1/aņ). 


4. If A is a diagonal matrix, then the of A is also a diagonal matrix [I]. 


Remarks 


Diagonal matrices are also sometimes called quasi-scalar matrices [I]. 


REFERENCES 


1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 


2, Wikipedia, agonal matri 
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216.16 diagonalization 


Let V be a finite-dimensional over a K,andT: V —> V alinear transformation 
To diagonalize T is to find a\basis of V that consists of |eigenvectors| The [transformation] is 


called diagonalizable if such a basis exists. The choice of terminology reflects the following. 


Proposition 5. The|matria of T relative to a given basis is a diagonal if and only if the 
basis in question consists of eigenvectors. 


Next, we give necessary and sufficient conditions for T to be diagonalizable. For À € K set 
Ey = {u € V : Tu = du}. 
The set Ey is a of V called the eigenspace associated to A. This subspace is 
non-trivial if and only if À is an [eigenvalue] of T. 
Proposition 6. A transformation is diagonalizable if and only if 
dim V = X dim F), 
A 


where the sum is taken over all eigenvalues of the transformation. 


There are two fundamental reasons why a transformation T can fail to be diagonalizable. 


1. The (characteristic polynomial] of T does not factor into linear factors over K. 


2. There exists an eigenvalue A, such that the of (T — AJ)? is strictly greater than 
the kernel of (T — AZ). Equivalently, there exists an [invariant subspace] where T acts 
as a nilpotent transformation plus some multiple of the 
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216.17 diagonally dominant matrix 


Let A be a|square matrix) (possibly complex) of (dimension! n with entries a,j. Then A is said 


to be diagonally dominant if 


laa] > XO layl 
Jal jFt 
for i from 1 ton. 
In addition A is said to be strictly diagonally dominant if 


n 


|ais| > 2 laiz] 


j=l iti 
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for i from 1 ton. 
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216.18 eigenvalue (of a matrix) 


Let A be an n x n matrix|of\complex numbers, A number A € C is said to be an eigenvalue} 


of A if there is a nonzero n x 1 column vector|x for which 
Ax = Ax. 


This definition raises several natural questions, among them: Does any matrix of complex 
numbers have eigenvalues? How many different eigenvalues can a matrix have? Given a 
matrix, how does one compute its eigenvalues? 


The answers to the above questions are usually studied in introductory linear algebra|courses, 
usually in the following sequence: 


e One learns that À € C is an eigenvalue of A precisely when A satisfies 
det(AI — A) =0 
where J denotes the n x n identity matrix and det is the 


e Basic facts about the determinant imply that det(AJ — A) is a polynomialjin A of degree) 
n. This is often referred to as the characteristic polynomial of A. (Note: some define 


the characteristic polynomial to be det(A — AJ); for the purposes of finding eigenvalues 
of A it makes no difference.) 


e From the fundamental theorem of algebra) we know that any polynomial with complex 
coefficients has at least one complex root; and at most n complex roots. 


e It follows that any matrix of complex numbers A has at least one eigenvalue, and at 
most n eigenvalues. 


If one is given a n x n matrix A of [real numbers| the above argument implies that A has 
at least one complex eigenvalue; the question of whether or not A has real eigenvalues is 
more subtle since there is no real-numbers analogue of the fundamental theorem of algebra. 
It should not be a surprise then that some real matrices do not have real eigenvalues. For 


example, let 
0 -1 
A= 6 ; ) | 


In this case det(AJ — A) = à? + 1;\clearly|no real number 4 satisfies A? + 1 = 0; hence A has 
no real eigenvalues (although A has complex eigenvalues i and —). 
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If one converts the above theory into an algorithm for calculating the eigenvalues of a matrix 
A, one is led to a two-step procedure: 


e Compute the polynomial det(AJ — A). 
e Solve det(AI — A) = 0. 


Unfortunately, computing n x n determinants and finding [roots] of polynomials of degree n 
are both computationally messy procedures for even moderately large n, so for most practical 


purposes variations on this naive scheme are needed. See the eigenvalue problem for more 


information. 


Remark: The definition of an eigenvalue for a endomorphism] can be found{here| The present 


entry is an extract of version 6 of that entry. 
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216.19 eigenvalue problem 


The eigenvalue problem appears as of the solution in many scientific or engineering 
applications. An example of where it arises is the determination of the main axes of a 


second order] surface Q = «7 Ax = 1 (with a|symmetric matrix) A). The task is to find the 
places where the 
o O 
(a) = ( @ 2) = 2a 


Ox, O8n 


is parallel to the [vector] x, i.e Ax = Az. 
[picture to go here] 


A solution z of the above equation with x’ Ar = 1 has the squared [distance] x? x = d? from 
the origin. Therefore, Ax? = 1 and d? = 1/X. The main axes are a; = 1//A;(i = 1,...,n). 


The general [algebraic] eigenvalue problem is given by 
Ag = Az, or (A—Al)x = 0 


with J the identity matrix, with an arbitrary square matrix] A, an unknown [scalar \, and the 
unknown vector x. A non-trivial solution to this system of n linear [homogeneous] equations 
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exists if and only if the 


ayy —A a12 eas Qin 

a dag —A ++ Aon 
det(A—AT)=] 7 . "key 

n1 an2 eae Ann — À 


This nth in À is called the characteristic equation, Its {roots| À are called 


the and the corresponding vectors £ In the example, x is a right 
eigenvector for A; a left eigenvector y is defined by y? A = py’. 


Solving this polynomial for A is not a practical method to solve the eigenvalue problem; a 
QR-based method is a much more adequate tool ({Golub89]); it works as follows: 


Ais reduced to the (upper) H or, if A is symmetric, toa 
T. 


This is done with a “similarity transform”: if S is a non-singular n x n{matrix| then Ag = \x 
is transformed to SAx = ASxz = SAST! or By = dy with y = Sx and B = SAS“, 
ie. A and B share the same eigenvalues (not the eigenvectors). We will choose for S 
The eigenvalues are then found by applying iteratively the 
QR decomposition} i.e. the Hessenberg (or tridiagonal) matrix H will be decomposed into 


upper triangular matrices R and orthogonal matrices Q. 


The algorithm is surprisingly simple; H = Hı is decomposed into Hı = QıRı, then an Hy 
is computed, Hə = RıQı. Hə is similar to Hı because Hy = RıQı = OO, HOw and is 
decomposed to Hy = Q2Ry. Then H; is formed, H3 = R2Qə2, etc. In this way a sequence] of 
H;’s (with the same eigenvalues) is generated, that finally converges to (for conditions, see 
[Golub89]) 


Ay X X 

0 à X X X 
0 0 A; X X 
0 0 0 An- X 
0 0 0 O An 


for the Hessenberg and 
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0 Ag 0 0 0 
0 0 A; 0 0 
0 0 0 Aga. 0 
0 0 0 0O An 


for the tridiagonal. 


References 


e Originally from The Data Analysis Briefbook (http: //rkb.home.cern.ch/rkb/titleA.html) 


Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John 
Hopkins University Press, 1989. 
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216.20 eigenvalues of orthogonal matrices 


Theorem Let A be an n x n . Then the following hold: 


1. The (characteristic polynomial p(\) = det(A— AT) of A is alreciprocal polynomial, that 
is, 
p(A) = £A"p(1/d). 


2. If A is an/eigenvalue of A, then so is 1/A. 


3. If n is odd, then either 1 or —1 is an eigenvalue of A. 


4. All eigenvalues have In other words, if A is an eigenvalue, then |A] = 1. 
Here, |À] is the [complex modulus) of À. 


Proof. Since A~! = AT, we have A— AI = —AA(AT — I/A). Taking the [determinant] of 
both sides, and using det A = det AT and det cA = c” det A (c € C), yields 


det(Ai— A) = SN" Geta = x), 


and property (1) follows. For property (2), let us first note that since A is orthogonal, no 
eigenvalue can be 0. Thus, p(A) = 0 implies that p(1/A) = 0, and property (2) follows. 
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For property (3), suppose n is odd. Then A has an [odd number] of eigenvalues (including 
multiplicities) that all|satisfy| property (2). Therefore, there must exist at least one eigenvalue 
à, such that A = 1/\. Then à = 1/\ = A/|à|?. Taking the [modulus] of both sides, and 
multiplying by |A| 4 0, gives |A| = 1. For part (4), let A be an eigenvalue corresponding to 
an eigenvector] x, i.e., Ax = Ax. Taking the gives x* A* = )a*. 
Here 2* is the row vector) corresponding to x, where each entry in complex conjugated. Also, 


A* = A’, where A’ is the transpose of A and A is the [complex conjugate) of A. Since A is 
we then have z*AT = \x*. Thus 


atx = 2* AT Av = dAa*r. 


As an eigenvector, x is non-zero, so |A| = 1. 


These results can be found in [I] (page 268). In the same reference, it is mentioned that 
properties (1) and (3) can essentially be found in a paper published in 1854 by Francesco 
Brioschi (1824-1897). Later in the same year, an improved proof was given by F. Faa di 
Bruno (1825-1888) [I]. Bibliographies of Brioschi and Faà di Bruno can be found at the 
MacTutor History of Mathematics archive [2] [8]. 


REFERENCES 


1. H. Eves, Elementary Matrix Theory, Dover publications, 1980. 


2. The MacTutor History of Mathematics archive, entry on Francesco Brioschi 
3. The MacTutor History of Mathematics archive, entry on Francesco Faà di Bruno 
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216.21 eigenvector 


Let A be an n x n square matrix) and x an n x 1 column vector, Then the eigenvectors of 


A are nonzero values x such that 


At = \x 


In other words, these {vectors} become multiples of themselves when transformed by A. 
One can find eigenvectors by first finding [eigenvalues] then for each eigenvalue A;, solving 
the system 
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to find a form which characterizes the eigenvector x; (any multiple of x; is also an eigenvec- 


tor). Of course this is not the smart way to do it; for this, see 
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216.22 exactly determined 


An exactly determined system of linear equations has precisely as many unknowns as 


equations and is hence soluble. 
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216.23 free vector space over a set 


In this entry we construct the free vector space over a set, or the vector space gen- 


erated by a set [I]. For a set X, we shall denote this {vector space] by C(X). One appli- 
cation of this construction is given in B], where the [free] vector space is used to define the 


tensor product) for modules 


To define the vector space C(X), let us first define C(X) as a set. For a set X and afield 
K, we define 


C(X) = {f:X >K | f-1(K\{0}) is finite}. 


In other words, C(X) consists of f: X — K that are non-zero only at finitely 
many points in X. Here, we denote the identity element] in K by 1, and the 
by 0. The vector space [structurel for C(X) is defined as follows. If f and g are functions 
in C(X), then f + g is the mapping|x + f(x) + g(x). Similarly, if f € C(X) and a € K, 
then af is the mapping x +> af(x). It is not difficult to see that these operations are well 
defined, i.e., both f + g and af are again functions in C(X). 


Basis for C(X) 


If a € X, let us define the function A, € C(X) by 


1 whens =a, 
Aaah = È otherwise. 


These functions form a [linearly independent]|[basis] for C(X), i.e., 
C(X) = span{Aa}acx. (216.23.1) 
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Here, the space span{A, Jae x consists of alllfinitellinear combinations of elements in {Ag tacx- 
It is clear that any element in span{A,} cx is a member in C(X). Let us check the other 
direction. Suppose f is a member in C(X). Then, let £),...,&€y be the distinct points in X 
where f is non-zero. We then have 


N 


f= Y IE) Ag, 


i=1 
and we have established equality in equation 216.23. 1] 


To see that the set {Aa}aex is linearly independent, we need to show that its any finite 
is linearly independent. Let {Ag,,...,A¢,} be such a finite subset, and suppose 
yo, aie, = 0 for some a; € K. Since the points £ĉ; are pairwise distinct, it follows that 
a; = 0 for all i. This shows that the set {Aq}aex is linearly independent. 


Let us define the mapping 1: X — C(X), x — A,. This mapping gives a bijection) between 
X and the basis [vectors] {A,}acx. We can thus identify these spaces. Then X becomes a 
linearly independent basis for C(X). 


Universal property of 1: X > C(X) 


The mapping v : X — C(X) is{universal] in the following sense. If ¢ is an arbitrary mapping 
from X to a vector space V, then there exists a unique mapping ¢ such that the below 
diagram commutes: 


“sy 


LA 
O(X) 


Proof. We define ¢ as the linear mapping that maps the basis elements of C(X) as ¢(A,) = 


o(x). Then, by definition, ¢ is linear. For uniqueness, suppose that there are linear mappings 
6,¢:C(X) > V such that 6 = po =p 04. For all x € X, we then have ¢(A,) = a(Az). 
Thus ¢ = g since both mappings are linear and the coincide on the basis elements. O 


REFERENCES 


1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975. 
2. I. Madsen, J. Tornehave, From Calculus to Cohomology, Cambridge University press, 1997. 
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216.24 ina vector space, Av = 0 if and only if \ =0 or 
v is the zero vector 


Theorem Let V be a [vector space] over the field F. Further, let A € F and v € V. Then 
Av = 0 if and only if À is zero, or if v is the [zero vector] or if both À and v are zero. 


Proof. Let us denote by 0p and by 1p the zero respectively [unit] elements in F. Similarly, 
we denote by Oy the zero vector in V. Suppose \ = 0p. Then, by [axiom 8) we have that 


lru + Orv = 1p, 


for all v € V. By laxiom 6) there is an element in V that cancels 1pv. Adding this element 
to both sides yields O0pv = Oy. Next, suppose that v = Oy. We claim that AOy = Oy for all 
à € F. This follows from the previous claim if A = 0, so let us assume that A 4 Op. Then 
AT} exists, and laxiom 7|implies that 


AA Tw + ADV = A(X w + Ov) 
holds for all v € V. Then using laxiom 3, we have that 
v + à0y =v 
for all v € V. Thus AOy satisfies the axiom for the zero vector, and AOvy = Oy for all A € F. 
For the other direction, suppose \v = Oy and \ Æ Or. Then, using [axiom 3|, we have that 
v = lpv = Atv =X 10y = Oy. 


On the other hand, suppose Av = Oy and v Æ Oy. If A Æ 0, then the above calculation for v 
is again valid whence 


Oy # v = 0y, 


which is a contradiction, so \ = 0. O 


This result with proof can be found in [I], page 6. 


REFERENCES 


1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975. 
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216.25 invariant subspace 


Let T : V > V be allinear transformation] of a [vector space] V. A Subspace|U' c V is called 


an invariant subspace of T if 
T(U) cU 


If U is an invariant subspace, then the restriciton of T to U defines a well defined linear 
transformation of U. 
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216.26 least squares 


The general problem to be solved by the least squares method is this: given some direct 
measurements y of and knowing a set of equations f which have to be 
satisfied by these measurements, possibly involving unknown parameters x, find the set of x 
which comes closest to satisfying 


f(x,y) =0 


where “closest” is defined by a Ay such that 


f(x,y + Ay) = 0 and Ay? is minimized 


The sum of squares] of elements of alvectorl can be written in different ways 
Ay? = Ay" Ay = ||Ayll2= > Ay? 


The assumption has been made here that the elements of y are statistically uncorrelated and 
have equal [variance] For this case, the above solution results in the most efficent estimators 
for x, Ay. If the y are correlated, and variances are defined by a covariance 
matrix] C, and the above minimum condition becomes 


Ay’ C7 Ay is minimized 


Least squares solutions can be more or less depending on the constraint equations 
f. If there is exactly one equation for each measurement, and the f are linear in 
the elements of y and x, the solution is discussed under linear regression. For other linear 


see linear least squares, Least squares methods applied to few parameters can lend 
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themselves to very efficient algorithms (e.g. in real-time image processing), as they reduce 


to simple matrix operations 


If the constraint equations are non-linear, one typically solves by linearization and inliterations| 
using approximate values of x, Ay in every step, and linearizing by forming the matrix of 
[derivatives] , df /dx (the Jacobian matrix) and possibly also df/dy at the last point of ap- 


proximation. 


Note that as the iterative improvements ôx, dy tend towards zero (if the process converges), 
Ay converges towards a final value which enters the minimum equation above. 


Algorithms avoiding the explicit calculation of df /dx and df /dy have also been investigated, 
e.g. [i]; for a discussion, see [2]. Where convergence (or control over convergence) is prob- 
lematic, use of a general package for minimization may be indicated. 


REFERENCES 


1. M.L. Ralston and R.I. Jennrich, Dud, a Derivative-free Algorithm for Non-linear Least Squares, 
Technometrics 20-1 (1978) 7. 

2. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, 
Second edition, Cambridge University Press, 1995. 


Note: This entry is based on content from the The Data Analysis Briefbook 
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216.27 linear algebra 


Linear algebra is the [branch] of mathematics devoted to the of linear |structure| The 
axiomatic treatment of linear structure is based on the notions of a [linear space (more com- 
monly known as a vector space), and a Broadly speaking, there are two 


fundamental questions considered by linear algebra: 
e the solution of a linear equation, and 
e diagonalization, a.k.a. the eigenvalue problem 


From the geometric point of view, “linear” is synonymous with “straight”, and consequently 
linear algebra can be regarded as the branch of mathematics dealing with lines and planes, 
as well as with of space that preserve “straightness”, e.g. rotations and 
reflections. The two fundamental questions, in geometric terms}, deal with 
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e the intersection) of hyperplanes} and 


e the principle axes of an ellipsoid. 


Linearity is a very basic notion, and consequently linear algebra has applications in numerous 


areas of mathematics, science, and engineering. Diverse disciplines, such as differential equations, 


differential the theory of relativity, quantum mechanics, electrical circuits, com- 
puter graphics, and theory benefit from the notions and techniques of linear 
algebra. 


geometry is related to a specialized branch of linear algebra that deals with linear 
measurement. Here the relevant notions are length and angle. A typical question is the de- 
termination of lines perpendicular to a given plane. A somewhat less specialized branch deals 
with affine structure, where the key notion is that of area and volume. Here 
play an essential role. 


Yet another branch of linear algebra is concerned with computation, algorithms, and numer- 
ical approximation. Important examples of such techniques include: (Ga an e 


the method of least squares||LU factorization, [Q QR decomposition, Gram-Schmidt orthogonalization oR ad oeo 
singular value decomposition, and a number of iterative algorithms for the calculation of 
eigenvalues) and [eigenvectors 


Syllabus. 


The following subject outline is meant to serve as a survey of some key topics in linear algebra 
(Warning: the choice of topics is far from comprehensive, and no doubt reflects the biases 
of the present author’s background). As such, it may (or may not) be of use to motivated 
auto-didacts interested in deepening their understanding of the subject. 


1. Linear structure. 


(a) Introduction: systems of linear equations, Gaussian elimination, matrices| matrix operations 


(b) Foundations: and vector spaces, [subspace] [linear independence\, [basis 
decomposition 
(c) Linear mappings: linearity [axioms] [kernels] and \images, injectivity, surjectiv- 
ity, bijections, compositions, inverses, matrix representations, change of basis, 
conjugation, similarity. 


2. Affine structure. 


(a) Determinants: characterizing properties, cofactor expansion| permutations, Cramer’s rule, 


classical adjoint 


(b) Geometric aspects: Euclidean volume, equiaffine transformations, 
determinants as geometric invariants of linear transformations. 


3. Diagonalization. 
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(a) Basic notions: eigenvector, eigenvalue, eigenspace, characteristic polynomia 
(b) Obstructions: imaginary) eigenvalues, nilpotent transformations, classification 


of 2-dimensional transformations. 


(c) Structure theory: invariant subspaces, Cayley-Hamilton theorem) | Jordan canonical form, 
rational) canonical form. 


4, 
(a) Foundations: vector space dual, bilinearity, Gram-Schmidt 


orthogonalization. 


(b) Bilinearity: bilinear forms, Symmoinie bilinear forms) quadratic Forms) Signature 
and Sylvester’s theorem) orthogonal transformations, skew-symmetric bilinear forms 


symplectic transformations. 


(c) tensor algebra: tensor product invariants of linear transforma- 
tions, operations. 


5. Euclidean and structure. 


(a) Foundations: inner product) axioms, the adjoint operation, symmetric) transfor- 
mations, skew-symmetric) transformations, self-adjoint transformations, norma 


transformations. 


(b) Spectral theorem: diagonalization of self-adjoint transformations, diagonaliza- 
tion of quadratic forms. 


6. Computational and numerical methods. 


(a) Linear problems: LU-factorization, QR decomposition, least squares, Householder transforma 


(b) Eigenvalue problems: singular value decomposition, Gauss and Jacobi-Siedel 
iterative algorithms. 
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216.28 linear least squares 


Let A be an m x n matrix] with m > n and b an m x 1 matrix. We want to consider the 
problem 


Az = bd 


where ~ stands for the best approximate solution in the |least squares] sense, i.e. we want to 
minimize the |Euclidean norm of the residual) r = Ax — b 
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i=1 


m 1/2 
|| Ax — Blo = [Irll2 = yor 


We want to find the [vector] x which is closest to b in the column’ space of A. 


Among the different methods to solve this problem, we mention{normal equations, sometimes 


QR decomposition, and, most generally, For 
further reading, [Golub89], [Branham90], [Wong92], [Press95]. 


Example: Let us consider the problem of finding the closest point to measurements 


on straight lines (e.g. ‘Trajectories| emanating from a particle collision). This problem can 
be described by Ax = b with 


Ami Am2 A 


This is an system of linear equations| with more equations than un- 


knowns, a frequently occurring problem in experimental data analysis. The system is, how- 
ever, not very inconsistent and there is a point that lies “nearly” on all straight lines. The 
solution can be found with the linear least squares method, e.g. by QR decomposition for 
solving Ax = b: 


QRz =b —> x = R!QTb 


References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) 


Wong92 S.S.M. Wong, Computational Methods in Physics and Engineering, Prentice Hall, 1992. 


Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John 
Hopkins University Press, 1989. 


Branham90 R.L. Branham, Scientific Data Analysis, An Introduction to Overdetermined Systems, 
Springer, Berlin, Heidelberg, 1990. 


Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes 
in C, Second edition, Cambridge University Press, 1995. (The same book exists for 
the Fortran language). There is also an Internet version which you can work from. 
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216.29 linear manifold 


Definition [I] Suppose V is a|vector space|and suppose that L is a non-empty |subset| of V. 


If there exists a v € V such that L +v = {v +1 |1 € L} is a{vector subspace] of V, then L 
is a linear manifold of V. Then we say that the {dimension| of L is the dimension of L + v 
and write dim L = dim(Z + v). If dim L = dim V — 1, then L is called a hyperplane. 


A linear manifold is, in other words, a linear subspace that has possibly been shifted away 
from the origin. For instance, in R? examples of linear manifolds are points, lines (which are 
hyperplanes), and R? itself. 


REFERENCES 


1. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977. 
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216.30 matrix exponential 


The [exponential] of a reall valued [square matrix) A, denoted by e^, is defined as 


oo 
e^ = ) 
k=0 


1 
z Tete Aaen 


| = 


k 
A 


= 


Let us check that e^ is a real valued square matrix. Suppose M is a real number such 
|Ai;| < M for all entries A;; of A. Then |(A?);;| < nM? for all entries in A?, where n is 
the [dimension] of A. Thus, in general, we have |(A*);;| < n*M**t. Since yar ne etl 
converges, we see that e4 converges to real valued n x n matrix) 


Example 1. Suppose A is/nilpotent| i.e., A” = 0 for some {natural number|r. Then 


1 
A = cS 2 iE anal a 
e I+A A Canny 
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Example 2. If A is diagonalizable, i.e., of the form A = LDL~', where D is adiagonal matrix, 
then 


1 
A _ —l1)\k 
Asy (LDL) 


= Le?PL. 
Further, if D = diag{a,,--- , an}, then D* = diag{at,--- ,a*} whence 
e^ = Ldiag{e",--- ,e™}L. 


For diagonalizable matrix A, it follows that det e4 = ete A, However, this is, in 
fact, valid for all A. 


Let A be a square] n x n real valued matrix. Then the matrix exponential the 


following properties 


1. For the n x n zero matrix O, e? = I, where J is the n x n lidentity matrix 


2. If A = Ldiag{a1,: -> ,a,}L7' for an invertible n x n matrix L, then 


e^ = Ldiag{e“,--- ce" LL, 


3. If B is a matrix of the same |type|as A, and A and B commute, then e^tB = ee, 
4. The (trace) of A and the /determinant] of e“ are related by the by the formula 
det ef = ete 4A, 
In effect, e^ is always invertible. The inverse] is given by 


(e4)-1 = e^. 
5. If trace A = 0, then e^ is a rotational matrix. 
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216.31 matrix operations 


A [matridis an array, or a rectangular grid, of numbers. An m x n matrix is one which has 
m (rows) and n columns! Examples of matrices include: 
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The 2 x 3 matrix 


The 3 x 3 matrix 


1 0 1 
B=|3 -1 5 
4 2 0 
e The 3 x 1 matrix 
7 
C= {7 
2 
e The 1 x 2 matrix 
D=(3 2) 


All of our example matrices (except the last one) have entries which are integers) In general, 
matrices are allowed to have their entries taken from any R. The set of all m x n 
matrices with entries in a ring R is denoted Mmxn( R). If a matrix has exactly as many rows 


as it has columns, we say it is alsquare matrix 


Addition of two matrices is allowed provided that both matrices have the same number of 
rows and the same number of columns. The sum of two such matrices is obtained by adding 
their respective entries. For example, 


7 —1 7 + (-1) 6 
7T)+145)=[ 7445 | =] 115 
2 0 2+0 2 


Multiplication of two matrices is allowed provided that the number of columns of the first 
matrix equals the number of rows of the second matrix. (For example, multiplication of a 
2x3 matrix with a 3 x 3 is allowed, but multiplication of a 3 x 3 matrix with a 2 x 3 matrix 
is not allowed, since the first matrix has 3 columns, and the second matrix has 2 rows, and 
3 doesn’t equal 2.) In this case the matrix multiplication is defined by 


(AB); = X (A)ix(B) ag - 


k 


We will describe how matrix multiplication works is with an example. Let 


1 0 1 
A=(; ae B=[|3 -1 5 
4 2 0 


be the two matrices that we used above as our very first two examples of matrices. Since A 
is a 2 x 3 matrix, and B is a 3 x 3 matrix, it is legal to multiply A and B, but it is not legal 
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to multiply B and A. The method for computing the product AB is to place A below and 
to the right of B, as follows: 


1 0 1 
3 -1 5 
4 2 0 


Gea EY) 


A is always in the bottom left corner, B is in the top right corner, and the product, AB, is 
always in the bottom right corner. We see from the picture that AB will be a 2 x 3 matrix. 
(In general, AB has as many rows as A, and as many columns as B.) 


Let us compute the top left entry of AB, denoted by X in the above picture. The way to 


calculate this entry of AB (or any other entry) is to take thefdot product] of the stuff above 
it [which is (1,3,4)] and the stuff to the left of it [which is (1, —2, 2)]. In this case, we have 


F251 43+(-9 E 


Similarly, the top middle entry of AB (where the Y is in the above picture) is gotten by 
taking the dot product of the stuff above it: (0, —1, 2), and the stuff to the left of it: (1, —2, 2), 
which gives 

Y =0-14(-1)-(-—2)+2-2=6 


Continuing in this way, we can compute every entry of AB one by one to get 


1 0 1 
3 -1 5 
4 2 0 
1-2 2\ (3 6 -9 
Ee 94 1 


and so 


If one tries to compute the illegal product BA using this procedure, one winds up with 


1 —2 2 
1 0 2 
1 0 1 ? 
3 -1 5 
4 2 0 


The top left entry of this illegal product (marked with a ? above) would have to be the dot 
product of the stuff above it: (1,1), and the stuff to the left of it: (1,0,1), but these 
do not have the same length, so it is impossible to take their dot product, and consequently 
it is impossible to take the product of the matrices BA. 


937 


Under the correspondence of matrices and [linear transformations, one can show that matrix 
multiplication is to composition of linear transformations, which explains why 
matrix multiplication is defined in a manner which is so odd at first sight, and why this 
strange manner of multiplication is so useful in mathematics. 
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216.32 nilpotent matrix 


The square matrix] A is said to be if A” = AA---A = 0 for some positive [integer] 
n times 


n (here O denotes the [matrix] where every entry is 0). 


Theorem 28 (Characterization of nilpotent matrices). A matrix is nilpotent iff it’s 


eigenvalues are all 0. 


A ssume A" = 0. Let À be an eigenvalue of A. Then Ax = Ax for some nonzero [vector] x. 
By [induction] \”x = A”x = 0, so À = 0. 


Conversely, suppose that all eigenvalues of A are zero. Then the chararacteristic polynomial) 
of A: det(AI — A) = à”. It now follows from the Cayley-Hamilton theorem that A” = 0. 


Since the is the product of the eigenvalues it follows that a nilpotent matrix has 
determinant 0. Similarly, since the {trace of a square matrix is the sum of the eigenvalues, it 
follows that it has trace 0. 


One of nilpotent matrices are the strictly triangular matrices] (lower or upper), this 
follows from the fact that the eigenvalues of a are the diagonal elements, 


and thus are all zero in the case of strictly triangular matrices. 
Note for 2 x 2 matrices A the theorem implies that A is nilpotent iff A = 0 or A? = 0. 
Also it’s worth noticing that any matrix that is similar to a nilpotent matrix is nilpotent. 
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216.33 nilpotent transformation 


A {linear transformation) N : U — U is called nilpotent if there exists a k € N such that 
Ne =0. 
A nilpotent transformation naturally determines a flag) of 
{0} C ker N! C ker N? Cc... C ker NT! C ker N! =U 
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and a Signature 
0 = no < Nn < Nna <... < Nga < Ngk = dim U, n; = dim ker N’. 


The signature is governed by the following constraint, and characterizes N up to linear 


isomorphism 


Proposition 7. A {sequence of increasing [natural numbers is the signature of a nil-potent 
if and only if 


Nj+1 — Ny S Nj — N51 


for allj = 1,...,k— 1. Equivalently, there exists albasid of U such that the {matrix of N 
relative to this basis is|block diagonal 


Nı O 0 0 
0M O 0 
0 0 M 0], 
0 0 0 Ny 


010... 00 
001 0 0 

N; = 
000". 10 
OOO sae 0 4 
000... 00 


Letting d; denote the number of blocks oflsizeli, the signature of N is given by 


Ni = Nni-1 + di + di1 +... + dk, A E 
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216.34 non-zero vector 
A non-zero vector in a [vector space] V is a [vector] that is not equal to the [zero vector] in 
V. 
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216.35 off-diagonal entry 


Let A = (aij) be a/square matrix} An element aj; is an off-diagonal entry if a;j is not on 
the diagonal, i.e., if i Æ j. 
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216.36 orthogonal matrices 


A {reall square n x n [matrix] Q is orthogonal if QTQ = I, i.e., if Qt = QT. The rows] and 
lcolumns! of an orthogonal matrix form an 


Orthogonal matrices play a very important role in linear algebra, Inner products are pre- 
served under an orthogonal transform: (Qx)T Qy = 27 Q7 Qy = zty, and also the 


\|Qz||2 = ||z||2> An example of where this is useful is solving the 
Ax = b by solving the equivalent) problem QT Az = QTD. 


Orthogonal matrices can be thought of as the real case of A unitary 
matrix U € C"*" has the U*U = I, where U* = U* (the [conjugate transpose). 
Since Qt = Q' for real Q, orthogonal matrices are 

An orthogonal matrix Q has det(Q) = +1. 


Important orthogonal matrices are|Givens rotations and Householder transformations, They 


help us maintain numerical stability because they do not amplify rounding errors. 


Orthogonal 2 x 2 matrices are rotations or reflections if they have the form: 


(“ney cate) (Sata) “Say 


respectively. 


REFERENCES 


1. Friedberg, Insell, Spence. Linear Algebra. Prentice-Hall Inc., 1997. 


This entry is based on content from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA .html 
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216.37 orthogonal vectors 


Two vı and v2, are orthogonal if and only if their (x, y)is 0. In two 


orthogonal vectors are perpendicular (or in n dimensions in the plane defined 
by the two vectors. ) 


A set of vectors is orthogonal when, taken pairwise, any two vectors in the set are orthogonal. 
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216.38 overdetermined 


An overdetermined system of linear equations) has more equations than unknowns. In 
general, overdetermined systems have no solution. In some cases, linear least squares) may 


be used to find an approximate solution. 
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216.39 partitioned matrix 


A partitioned matrix, or a block matrix, is a\matrix) M that has been constructed from 
other smaller matrices. These smaller matrices are called blocks) or sub-matrices of M. 


For instance, if we the below 5 x 5 matrix as follows 


2 3 
then we can define the matrics A=( 5 1), B= (133 ):C= 2 ok D= 
2 3 


, and write L as 


O oo wo 
O oo wo 
ow wo 
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If A,,...,A, are (of possibly different dimensions), then we define the 
ldirect_sum of the matrices A;,...,A,, as the partitioned matrix 


Aj 
diag(Ai,..., An) = ; 


where the off-diagonal blocks are zero. 
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216.40 pentadiagonal matrix 

An n x n pentadiagonal matrix] (with n > 3) is a matrix of the form 


a d & 0 see pe 0 


bi & dy €2 


at bə 
0 ag Bi es ig En—3 0 
dn—2 En—2 
An-3 bn 2 Cn-1 dn 1 
() cite o Gale 0 An—2 bn—1 Cn 


It follows that a pentadiagonal matrix is determined by five one n-vector c = 
(C1,---,C€n), two (n — 1)-vectors b = (b1,...,bn-1) and d = (dj,...,d,_1), and two (n — 2)- 
vectors a = (@1,..-,@n—2) and e = (€),...,€n—2). It follows that a pentadiagonal matrix is 
completely determined by n + 2(n — 1) + 2(n — 2) = 5n — 6 scalars 
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216.41 proof of Cayley-Hamilton theorem 


We begin by showing that the theorem is true if the characteristic polynomial does not have 
repeated [roots], and then prove the general case. 


Suppose then that the [discriminant] of the characteristic polynomial is non-zero, and hence 
that T : V — V has n = dim V distinct ‘eigenvalues once we extend [] to the algebraic closure 


1 Technically, this means that we must work with the V = V@k, where k is the 
algebraic closure of the original of [scalars] and with T : V — V the extended with 


action 


T(v@a)—>T(V)@a, vEV, ack. 
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of the ground field. We can therefore choose albasis|of/eigenvectors| call them vj,...,V,, with 
A1,...,An the corresponding eigenvalues. From the definition of characteristic polynomial 
we have that 


The |factors) on the right commute, and hence 
cr(T)v; = 0 
for alli =1,...,n. Since cr(T) annihilates a basis, it must, in fact, be zero. 


To prove the general case, let d(p) denote the discriminant of a p, and let us 
remark that the discriminant 


T =œ d(er), T € End(V) 


is polynomial on End(V). Hence the set of T with distinct eigenvalues is a. dense oper subset 
of End(V) relative to the Zariski topology, Now the characteristic polynomial map} 


Trer(T), T € End(V) 


is a polynomial map on the vector space End(V). Since it vanishes on a dense open subset, 
it must vanish identically. Q.E.D. 
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216.42 proof of Schur decomposition 


Thelcolumnslof the Q in[Schur’s decomposition|theorem form anlorthonormal basis) 
of C”. The A takes the upper-triangular form D + N on this Conversely, if 
v1, ...,Un iS an orthonormal basis for which A is of this form then the matrix Q with v; as 


its i-th column |satisfies| the theorem. 


To find such a basis we proceed by [induction] on n. For n = 1 we can simply take Q = 
1. If n > 1 then let v € C” be an [eigenvector] of A of [unit] length and let V = vt be 
its orthoplement. If m denotes the the line spanned by) v then 
(1 — 7)A maps V into V. 


By induction there is an orthonormal basis v2,..., Un of V for which (1—7)A takes the desired 
form on V. Now A = 7A+(1—7)A so Av; & (1 — 7) Av;( (mod#1)) for i € {2,...,n}. 
Then v,v2,...,Un can be used as a basis for the Schur decomposition on C”. 
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216.43 singular value decomposition 


Any real) m x n|matrix] A can be decomposed into 


A = USV? 
where U is an m x m orthogonal matrix, V is an n x n orthogonal matrix, and S is a 


unique m x n diagonal matrix with real, non-negative elements o;, i = 1,..., min(m, n) , in 
descending [order 


Mi 0g > Omin(m,n) = 0 


The g; are the singular values of A and the first min(m, n) (columns of U and V are the 
left and right (respectively) singular vectors of A. S has the form: 


ol ifm > n and [= 0] ifm <n, 


where X is a diagonal matrix with the diagonal elements 01, 02,...,Omin(mn). We assume 
now m > n. If r =rank(A) <n , then 


Oy 2 02 2 2 Or > Ory = =O, = 0, 
If o, £ 0 and 0,4; =+- =o, = 0, then r is the{rank| of A. In this case, S becomes an r x r 


matrix, and U and V shrink accordingly. SVD can thus be used for rank determination. 


The SVD provides a numerically robust solution to the The matrix- 


algebraic phrasing of the least-squares solution x is 
a = (ATA) 'ATb 
Then utilizing the SVD by making the replacement A = USV7 we have 
g=V |27 0| Uh. 
References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) 
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216.44 skew-symmetric matrix 


Definition: 
Let A be an of dimension! n x n with entries (a;;). The A is 
skew-symmetric) if aj; = —aji for alll 1.74.64, <= 9 Sn. 


ay, =O0 -+ Qin 
A= 
Ani ann = 0 
The main diagonal entries are zero because a;,; = —a; ; implies a;; = 0. 


One can see skew-symmetric matrices as a special case of complex skew-Hermitian matrice 


Thus, all of skew-Hermitian matrices also hold for skew-symmetric matrices. 


Properties: 


1. The matrix A is skew-symmetric if and only if A’ = —A, where At is the matrix 


ranspose 


2. For the we have that tr(A) = tr(A‘). Combining this with property 
(1), it follows that tr(A) = 0 for a skew-symmetric matrix A. 


3. Skew-symmetric matrices form a If A and B are skew-symmetric and 
a, 8 € R, then aA + ØB is also skew-symmetric. 


4. Suppose A is a skew-symmetric matrix and B is a matrix of same dimension as A. 
Then B‘AB is skew-symmetric. 


5. All [eigenvalues] of skew-symmetric matrices are purely or zero. This result 
is proven on the page for skew-Hermitian matrices. 


6. According to Jacobi’s Theorem) the of a skew-symmetric matrix of odd 


dimension is zero. 


Examples: 
R 0 b 
—b 0 
0O b 
e |—b 0 e 
c —e 0 
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216.45 square matrix 


A square matrix has the same number of rows) as columns 


Examples: 


1.00000 0.50000 0.33333 0.25000 
0.50000 0.33333 0.25000 0.20000 
33333 0.25000 0.20000 0.16667 
25000 0.20000 0.0.16667 0.14286 


(1) 


0.94 0.37 
90 0.16 
50 0.03 
15 0.59 
04 0.64 


0.71 
0.74 
0.07 
0.43 
0.61 


0.32 
0.83 
0.49 
0.03 
0.17 


0.58 
0.27 89 
0.550. 64 
0.76 40 
0.29 


The notation Mat„(K) is often used to signify the standard |class| of square matrices which 


are of dimension|n x n with elements draw from a field!] 


K. Thus, one would use a € Mat3(C) 


to declare that a is a three-by-three [matrix] with elements that are complex numbers) 
property| Suppose A and B are matrices such that AB is a square matrix. Then the 


product BA is defined and also a square matrix. 
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216.46 strictly upper triangular matrix 


A strictly upper triangular matrix is an upper triangular matrix which has 0 on the 


main diagonal. Similarly A strictly lower triangular matrix is an upper triangular 
matrix which has 0 on the main diagonal. i.e. 


A strictly upper triangular matrix is of the form 


0 a 
0 0 
0 0 
0 0 


Q13 
Q23 


0 


A strictly lower triangular matrix is of the form 
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38 50 
26 98 
96 83 


a21 0 0 0 
a31 439 0 see 0 
Ani Gn2 Ang t: 0 


Version: 4 Owner: Daume Author(s): Daume 


216.47 symmetric matrix 


Definition: 


Let A be a [square matrix] of [dimension] n. The [matrix] A is symmetric if a;; = aj; for all 


IKL IKn, IIÍ Sn. 


1. At = A where At is the matrix [transpose] 


Examples: 

e (0 b 
b c 
abe 

e |b d e 
c e f 
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216.48 theorem for normal triangular matrices 


Theorem ([I], pp. 82) Let A be a [complex][square matrix] Then A is diagonal if and only 
if A is normal] and triangular. 
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Proof. If A is a{diagonal matrix, then the [complex conjugate A* is also a diagonal matrix. 


Since arbitrary diagonal matrices commute, it follows that A*A = AA*. Thus any diagonal 


matrix is a normal triangular matrix 


Next, suppose A = (a;;) is an normal upper triangular matrix. Thus a;; = 0 for i > j, so for 
the diagonal elements in A*A and AA*, we obtain 


a 
> lakil’, 
k=1 


(AA")Ja = X Jaxl’. 
k=i 


(A*A)i 


For į = 1, we have 
lanl’ = lanl + lail? +-+ |ain|*. 


It follows that the only non-zero entry on the first row of A is ayı. Similarly, for i = 2, we 
obtain 
|ay2|? + |@22|? = |ar2|? +--+ + Jaan. 


Since aı2 = 0, it follows that the only non-zero element on the second row is ag2. Repeating 
this argumentation for all{rows| we see that A is a diagonal matrix. Thus any normal upper 
triangular matrix is a diagonal matrix. 


Suppose then that A is a normal lower triangular matrix. Then it is not difficult to see that 
A* is anormal upper triangular matrix. Thus, by the above, A* is a diagonal matrix, whence 
also A is a diagonal matrix. O 


REFERENCES 
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So- 


ciety, 1994. 
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216.49 triangular matrix 


An upper triangular matrix is of the form 
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Qi, 412 Q13 *** Ain 


0 az G23 +++ am 
0 0 a33 t A3n 
0 0 O -> Ann 


11 0 0 0 
aza a2 0 0 
a31 Q32 Q33 °°" 0 
Ani An2 An3 *** Amn 


Triangular matrices allow numerous algorithmic shortcuts in many situations. For example, 
Az = b can be solved in n? operations if A is an n x n triangular matrix. 


Triangular matrices have the following properties| (prefix ”triangular” with either ” upper” 
or “lower” uniformly): 


The inverse of a triangular matrix is a triangular matrix. 


The product of two triangular matrices is a triangular matrix. 
The determinant) of a triangular matrix is the product of the diagonal elements. 


The eigenvalues) of a triangular matrix are the diagonal elements. 


The last two properties follow easily from the of the triangular matrix. 
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216.50 tridiagonal matrix 


An n x n tridiagonal |matrix] is of the form 
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lL da w 0 0 
0 lə ds U3 0 
0 0 O tne dy 
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216.51 under determined 


An under determined [system of linear equations) has more unknowns than equations. It 
can be |consistent) with infinitely many solutions, or have no solution. 
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216.52 unit triangular matrix 
A unit triangular matrix is a with 1 on the diagonal. 


i.e. 


A {unit| upper triangular matrix is of the form 


1 aig a13 Qin 
0 1 093 An 
0 0 1 A3n 
0 0 0 1 


1 0 0 0 
a21 1 0 0 
a3, 4392 1 -++ 0 
Ani n2 Ang `t: 1 
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216.53 unitary 


Definitions. A unitary space V isa with a distinguished positive definite 
(-,-):VxV—C, 
which serves as the ‘inner product] on V. 
A unitary transformation is a|surjective|linear transformation T : V — V satisfying 
(u,v) = (Tu, Tv), u,veV. (216.53.1) 
These are the lisometriesl for [Euclidean space 


A unitary matrix is a [quare] complex-valued A, whose is equal to its 


conjugate transpose 
A7 = A. 


Remarks. 


1. A standard example of a unitary space is C” with inner product 
(uo) =X uT, uvec”. (216.53.2) 
i=1 


2. Unitary transformations and unitary matrices are closely related. On the one hand, 
a unitary matrix defines a unitary transformation of C” relative to the inner product 
(216.53.2). On the other hand, the representing matrix of a unitary transformation 
relative to an is, in fact, a unitary matrix. 


3. A unitary transformation is an This follows from the fact that a unitary 
transformation T preserves the inner-product mormi 


[Tu] = llull, we V. (216.53.3) 


Hence, if 
Tu = 0, 


then by the definition (216.53.1) it follows that 
lull = 0, 

and hence by the inner-product [axioms] that 
u=0. 


Thus, the kernel of T is trivial, and therefore it is an automorphism. 
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. Indeed, (216.53.3) can be taken as the definition of a unitary transformation. 
This follows from the polarization identity for sesquilinear forms, namely 


2(u,v) = |u + oll? + ilu + ioll? — (1 +2)|lull? — A + lvl’. 


The polarization identity is obtained by taking a linear combination! of the following 


two bilinearity relations: 


(u+v,u +v) (u, u) + (u,v) + (u,v) + (w, v) 
(utiv,utiv) = (u,u) —i(u,v) +i(u,v) + (v, v) 


Thanks to the polarization identity, it is possible to show that if T preserves the norm, 


then (216:53-T) must hold as well. 
. A\simple example of a unitary matrix is the {change of coordinates|matrix between two 


orthonormal bases; Indeed, let u4,..., Un and v1,..., Un, be two orthonormal bases, and 
let A = (AŻ) be the corresponding change of matrix defined by 


wy =} 4j u vie eee. 


Substituting the above relation into the defining relations for an orthonormal basis, 


Ua? = Oe, 


(UK, UL) Onl; 


we obtain = _ 
S bij ALA} =Y AL Ai = bu. 
ij i 
In matrix notation, the above is simply 
AA‘ =I, 


as desired. 


. Unitary spaces, transformations, and matrices are of fundamental importance in quan- 
tum mechanics. 
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216.54 vector space 


Let F be alfield) A vector space V over F is a set with two +:VxV — V 
and -: F x V — V, such that 
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. (a+b) +c=a+(b+c) for all a,b,c € V 

. a+b=b+a for all a,b € V 

. There exists an element 0 € V such that a+ 0 = a for all a € V 
. For any a € V, there exists an element b € V such that a +b = 0 
. kı- (ka - v) = (kı + k2) - v for all kı, ko € F and v E€ V 

. 1-v =v for all v € V 

. k- (v +w) =(k-v)+(k-w) for all k € F and v,w E€ V 

. (ky + kg) -v = (kı - v) + (k2 - v) for all kı, k2 E€ F and v E€ V 


Equivalently, a vector space is a [module] V over a field F. 


The elements of V are called [vectors] and the element 0 € V is called the zero vector of V. 
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216.55 vector subspace 


Definition Let V be a [vector space) over a field F, and let W be a nonempty subset] of V. 


If W is a vector space, then W is a vector subspace of V. 


If W is a subset of a vector space V, then a sufficient condition for W to be a subspace is 
that œa + 6b € W for all a,b € W and all a, Je F. 


Examples 


1. 


Every vector space contain two trivial vector subspaces: the entire vector space, and 
the [zero vector space 


. If S and T are vector subspaces of a vector space V, then the |vector|sum 


S+T={s+tEV|sES,tET} 


and thelintersection] 
S()\T={ueV|ueSueT} 


are vector subspaces of V. 


. Suppose S and T are vector spaces, and suppose L is a L:S—T. 


Then Img L is a vector subspace of T, and Ker L is a vector subspace of S. 
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Results for vector subspaces 


Theorem 1 [I] Let V be a finite dimensional] vector space. If W is a vector subspace of V 
and dim W = dim V, then W = V. 


Theorem 2 [2] (Dimension theorem for subspaces) Let V be a finite dimensional vector 
space with subspaces S' and T. Then 


dim(S+T)+dim($()T) = dimS + dimT. 


Theorem 3 ([I], page 42) (Dimension theorem for a composite mapping) Suppose U, V, W 
are finite dimensional vector spaces, and L : U — V and M : V — W are linear mappings. 
Then 


dim Img L N KerM = dimImg L -— dim Img ML 
dim Ker ML — dim Ker L. 


REFERENCES 


1. S. Lang, Linear Algebra, Addison-Wesley, 1966. 
2. W.E. Deskins, Abstract Algebra, Dover publications, 1995. 


3. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So- 
ciety, 1994. 
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216.56 zero map 


Definition Suppose X is a set, and Y is a [vector space) with [zero vector] 0. If T is a {map} 
T : X —Y, such that T(x) = 0 for all x in X, then T is a zero map. 


Examples 
1. On the set of non-invertible|n x n{matrices, the [determinant) is a zero map. 
2. If X is the [zero vector space, any T : X — Y is a zero map. In fact, 
T(0) = T(0 - 0) = 0T(0) = 0. 
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216.57 zero vector in a vector space is unique 


Theorem The zero vector in a vector space is unique. 


Proof. Suppose 0 and 0 are in a vector space] V. Then both 0 and 0 must 
satisfy axiom 3) i.e., for all v € V, 


v+0 = v, 
v+0 =v. 


Setting v = 0 in the first equation, and v = 0 in the second yields 0 + 0 = 0 and 0 +0 = 0. 
Thus, using laxiom 2) 


+0 
+0 


=0, 


and 0 = 0. 
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216.58 zero vector space 


Definition A zero vector space is a vector space) that contains only one element, a 
zero vector 


Properties 


1. Every vector space has a zero vector space as a vector subspace 


2. A vector space X is a zero vector space if and only if the [dimension] of X is zero. 
3. Any linear map| defined on a zero vector space is the zero map) If T is linear on {0}, 
then T(0) = T(0-0) = 07(0) = 0. 
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Chapter 217 


15-01 — Instructional exposition 
(textbooks, tutorial papers, etc.) 


217.1 circulant matrix 


A [square matrix] M : A x A —> C is said to be circulant if for some o of 


A, we have 
M(o(x),o(y))=M(z,y)  Va,ye A 


or equivalently 
M(o(x),y)=M(e,o7(y)) Va,y eA. 


The same term is in use in a more restrictive sense, when the indexing set A is {1,2,...,d}. 
A [matrix] of the form 
M M M3 ... Ma 
Mo M M ... Mia 
Maa Ma Mı ... Ma 
M M, M, ... M 


is called circulant. This concurs with the first definition, since we can define the permutation 
o: A — Aby 
o(z)=x2+1 1<r<d 


old)=1. 
Because the Jordan decomposition of a circulant matrix is rather simple, circulant matrices 
have some interest in connection with the approximation of of more general 


matrices. In particular, they have become part of the standard apparatus in the computerized 
analysis of signals and images. 
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217.2 matrix 


A matrix is simply a M: Ax B — C of the product of two sets into some third 
set. As a rule, though, the word matrix and the notation associated with it are used only in 


connection with {linear mappings! In such cases C is the [ring] or [feld] of scalars 


Matrix of a linear mapping 


Definition:Let V and W be over the same field k, with 
[bases] A and B respectively, and let f: V — W be a linear mapping. For each a € A let 
(kabeg be the unique family of scalars (elements of k) such that 


f@)= > hab 


beB 


Then the family (Mab) (or equivalently the mapping (a,b) + Ma of A x B — k) is called 
the matrix of f with respect to the given bases A and B. The scalars Ma, are called the 
components of the matrix. 


The matrix describes the [function] f completely; for any element 


C= J Lal 


acA 


of V, we have 
f(a) = X Mab 
acA 
as is readily verified. 


Any two linear mappings V — W have a sum, defined pointwise; it is easy to verify that the 
matrix of the sum is the sum, componentwise, of the two given matrices. 


The formalism of matrices extends somewhat to linear mappings between modules, i.e. 
extends to a ring k, not necessarily commutative) rather than just a field. 


Rows and columns; product of two matrices 


Suppose we are given three modules V, W, X, with bases A, B, C respectively, and two linear 
mappings f: V — W and g: W — X. f and g have some matrices (Ma) and (Nac) with 
respect to those bases. The product matrix NM is defined as the matrix (P,e) of the 
function 


tr g(f(x)) 
V-W 
with respect to the bases A and C. Straight from the definitions of a linear mapping and a 
one verifies that 
Bacy Magic (217.2.1) 


beB 
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for alla € A and cE C. 


To illustrate the notation of matrices in terms of rows and columns, suppose the spaces 
V, W, X haveldimensionsl 2, 3, and 2 respectively, and bases 


A= {a1, a} B= {b1, b2, b3} C= {C1, Co} - 


We write 


Ny N: 
& Miz a ae & 


Na No | = 
Ma Mx Mos} N. N, Pa Po 


(Notice that we have taken a liberty with the notation, by writing e.g. My. instead of 
Maja.) The equation @IZZ2.I) shows that the multiplication of two matrices proceeds 
“rows by columns”. Also, in an expression such as Nos, the first index refers to the row, and 
the second to the column, in which that component appears. 


Similar notation can describe the calculation of f(x) whenever f is a linear mapping. For 
example, if f: V — W is linear, and z = $0, x;a; and f(x) = >°, yibi, we write 


My, Mi M: 
(a a) (ye Me ne) = (ve) 


When, as above, a “row vector” denotes an element of a space, a “column vector” denotes 


an element of the [dual space} If, say, f : W* — V* is the transpose of f, then, with respect 
to the bases dual to A and B, an equation IÈ; V;0;) = do; uia; may be written 


v 
M1) _ My Mi Mis 2 
[2 Ma, Mə Mə ? 
V3 
One more illustration: Given a bilinear forml L: V x W — k, we can denote L(v, w) by 
wi 
Ly, Ly Lig 
v v w 
(v1 (3 La Laj h 
W3 


square matrix 


A matrix M: A x B — C is called squarejif A = B, or if some [bijection] A — B is implicit 
in the context. (It is not enough for A and B to be|equipotent|) Square matrices naturally 
arise in connection with a linear mapping of a space into itself (called an endomorphism), 
and in the related case of a change of basis (from one basis of some space, to another basis 
of the same space). 


Miscelleous usages of “matrix” 


The word matrix has come into use in some areas where linear mappings are not at issue. 


An example would be a combinatorical statement, such as|Hall’s marriage theorem, phrased 
in terms of “0-1 matrices” instead of |subsets! of A x B. 
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Remark 


Matrices are heavily used in the physical sciences, engineering, statistics, and computer pro- 
gramming. But for purely mathematical purposes, they are less important than one might ex- 
pect, and indeed are frequently irrelevant in [linear algebra) Linear mappings, 
transposes, and a number of other simple notions can and should be defined without 
matrices, simply because they have a meaning independant of any basis or bases. Many little 
theorems in linear algebra can be proved in a simpler and more enlightening way without 
matrices than with them. One more illustration: The [derivative] (at a point) of a mapping 
from one surface to another is a linear mapping; it is not a matrix of 


because the matrix depends on a choice of basis but the derivative does not. 
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Chapter 218 


15-X X — Linear and multilinear 
algebra; matrix theory 


218.1 linearly dependent functions 


Let fi, fo, f3, ---, fn be reall valued functions! defined on some J C R. Then fi, fo, fs, ..., fn are 


said to be linearly dependent if for some a1, a2, a3, ..., an E€ R not all zero, we have that: 


n 


X afila) =0,Vz E I. 


i=] 
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Chapter 219 


15A03 — Vector spaces, linear 
dependence, rank 


219.1 Sylvester’s law 


The ‘rank and |signature| of a quadratic form] are [invariant] under change of 
In if A is alreallsymmetric matrix there is an invertible matrix S such that 


I, 0 0 
SAST=|0 -Is 0 
0 0 0 


where r, the rank of A, and s, the signature of A, characterise the (congruence) class! 
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219.2 basis 


A (Hamel) basis of a {vector space] is a linearly independent 


It can be proved that any two bases of the same vector space must have the same |cardinality 
This introduces the notion of of a vector space, which is precisely the cardinality 
of the basis, and is denoted by dim(V), where V is the vector space. 


The fact that/every vector space has a Hamel basis\is an important consequence of thelaxiom of choice] 
(in fact, that is equivalent] to the axiom of choice.) 
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Examples. 


e 8 = {ei}, 1 <i < n, is a basis for R” (the n-dimensional vector space over the reals). 
For n = 4, 


D 

II 
oga 
Do0 ro 
OGroOS 
=oD 


e 6 = {1, x, x°} is a basis for the vector space of [polynomials] of 2. 


Zo 3) 5) a} 


is a basis for the vector space of 2 x 2 matrices, and so is 


g= 2 0 0 1 0 0 0 0 
~ [XO 07710 OF’\0 4)7°\E OF J’ 
e The is a basis for the trivial vector space which consists of the unique element 


0. 
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219.3 complementary subspace 


Direct sum decomposition. Let U be a|vector space, and V, W C U|subspaces| We say 


that V and W ppan U, and write 
U=V4+wWw 


if every u € U can be expressed as a sum 
U=U+Ww 
for some v € V and w € W. 
If in addition, such a decomposition is unique for all u € U, or equivalently if 
vò = {0}. 
then we say that V and W form a direct sum decomposition of U and write 


U=V W. 
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In such circumstances, we also say that V and W are complementary subspaces. 


Here is useful characterization of complementary subspaces if U is finite-dimensional! 


Proposition 8. Let U,V,W be as above, and suppose that U is finite-dimensional. The 
subspaces V and W are complementary if and only if for every \basis v1,...,Um of V and 
every basis W1,...,Wn a basis of W, the combined list 


Ul; -3 Um, W1; -3 Wn 


is a basis of U. 


Let us also remark that direct sum decompositions of a vector space U are in a one-to 


correspondence fashion with on U. 


Orthogonal decomposition. Specializing somewhat, suppose that the ground K 


is either the or and that U is either an or a 
unitary space, i.e. U comes equipped with a positive-definite inner product| 


(Ve CU >K. 


In such circumstances, for every subspace V C U we define the orthogonal complement of 
V, denoted by V+ to be the subspace 


Vi = {u€ U : (v,u) =0, for all v € V}. 


Proposition 9. Suppose that U is finite-dimensional and V C U a subspace. Then, V and 
its orthogonal complement V+ determine a direct sum decomposition of U. 


Note: the is false if either the finite-dimensionality or the positive-definiteness 
assumptions are violated. 
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219.4 dimension 
Let V be a[vector space) over alfield K. We say that V is finite-dimensional if there exists a 
of V. Otherwise we call V infinite-dimensional. 


It can be shown that every basis of V has the same cardinality, We call this cardinality the 
dimension of V. In particular, if V is finite-dimensional, then every basis of V will consist 


of a finite set v1,..., Un. We then call the natural number|n the dimension of V. 


Next, let U CV a [subspace] The dimension of the quotient vector space V/U is called the 
codimension of U relative to V. 


963 


Note: in circumstances where the choice of field is ambiguous, the dimension of a vector 
space depends on the choice of field. For example, every [complex] vector space is also a [real 
vector space, and therefore has a real dimension, double its complex dimension. 


Version: 4 Owner: rmilson Author(s): rmilson 


219.5 every vector space has a basis 


Every vector space has a basis. This result, trivial in the finite) case, is in fact rather 


surprising when one thinks of and the definition of a [basis] 
The theorem is to the family of and theorems. Here we 
will only prove that |Zorn’s lemma) implies that every vector space has a basis. 


Let X be any vector space, and let A be the set of linear of X. For 
x,y E€ A, we define x < yliffx C y. It it easy to see that this (the canonical orderlrelation onl 


subsets) defines a [partial order|on A. For each [chain] C € A, define C’ = UC. Now any 
finite of from C’ must lie in a single set c € C”, and as such they are 


This shows that C’ € A and thus C” is an [upper bound) for C. 


According to Zorn’s lemma A now has a\maximal element), £, which by definition is linearly 


independant. But € is also a for is this were not true let z € X be any vector 
not in the span of £. Then €J{z} would again be a linearly independent”? set larger than 


€, contradicting that € is a maximal element. Thus € is a basis for X. 
o 


*) Ifnot, let (x;) bea finite collection of vectors from € so that a1£1+a2£2+- + -+4)%,+a,2 = 
0, and not all a; are 0. W must then have a, 4 0, because if not we would have a non-trivial 
linear combination) of vectors from £ equalling the|zero vector| contrary to £ being a linearly 
independent set. But then 


contrary to z not being in the span of €. 
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219.6 flag 


Let V be a finite-dimensional A of 
Y0Ve C+: CWh=V 
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is called a flag in V. We speak of a complete flag when 
dim V; = 2 
for each i = 1,...,n. 
Next, putting 
dk = dim Vk, k=1,...n, 


we say that a list of (u1, ..., Ua, ) is an adapted basis relative to the flag, if the first 
dı vectors give a basis! of V;, the first dọ vectors give a basis of V2, etc. Thus, an alternate 
characterization of a complete flag, is that the first k elements of an adapted basis are a 
basis of Vg. 


Example Let us consider R”. For each k = 1,...,n let Vp be thespanjof e1,...,e,, where 
e; denotes the j basic vector, i.e. the [column vector! with 1 in the j™ position and zeros 
everywhere else. The V; give a complete flag in R” . The list (e1, €2,...,¢€n) is an adapted 
basis relative to this flag, but the list (e2, €1,...,@n) is not. 
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219.7 frame 


Introduction Frames and coframes are notions closely related to the notions of and 
As such, frames and coframes are needed to describe the [connection] between 
list vectors) and the more general 


Frames and bases. Let U bea over a field|K, and let I be 
a (finite) [totally ordered] set of indices H e.g. (1,2,...,n). We will call almapping| F : J > U 
a reference frame, or simply a frame. To put it plainly, F is just a list of elements of U with 
indices belong to J. We will adopt a notation to reflect this and write F; instead of F(i). 
Subscripts are used when writing the frame elements because it is best to regard a frame as 
a row-vector À whose entries happen to be elements of U, and write 


F = (Fi,...,F,). 


This is appropriate because every reference frame F naturally corresponds to a 
F : K? — U defined by 


ar X aF, acK. 
iEI 
1 Tt is advantageous to allow general indexing sets, because one can indicate the use of multiple frames 
of reference by employing multiple, (disjoint sets of indices. 
2 It is customary to use superscripts for the components of a and subscripts for the 
components of a/row vector) This is fully described in the|vector entry| 
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In other words, F is a [linear forml on K! that takes values in U instead of K. We use row 
vectors to [represent] linear forms, and that’s why we write the frame as a row vector. 


We call F a coordinate frame (equivalently, a basis), if F is an [isomorphism] of vector spaces. 
Otherwise we call F degenerate, or both, depending on whether F fails to be, 


respectively, jective) and surjective! 


Coframes and coordinates. In cases where F is a basis, the inverse isomorphism 
E~ : U — K’ 


is called the coordinate mapping. It is cumbersome to work with this inverse explicitly, and 
instead we introduce linear forms x‘ € U*, i € I defined by 


x: u> Fl(u)(i), ueu. 


Each x’, i € I is called the it coordinate [functionlrelative to F, or simply the ¿t coordinate 
In this way we obtain a mapping 


x: JU, ix 


called the coordinate coframe or simply a coframe. The forms x’, i € I give a basis of U*. It 
is the dual basis of F;, 7 € J, i.e. 


(FE j=0 LJE, 
where ĝi is the well-known Kronecker symbol. 


In full duality to the custom of writing frames as row-vectors, we write the coframe as a 
column vector whose components are the coordinate functions: 


3 Strictly speaking, we should be denote the coframe by xp and the the coordinate functions by x}, so as 
to reflect their dependence on the choice of reference frame. Historically, writers have been loath to do this, 
preferring a couple of different notational tricks to avoid ambiguity. The cleanest approach is to use different 
symbols, e.g. x’ versus y’, to distinguish coordinates coming from different frames. Another approach is to 
use distinct indexing sets; in this way the indices themselves will indicate the choice of frame. Say we have 
two frames F : J — U and G : J — U with I and J distinct finite sets. We stipulate that the symbol i 
refers to elements of J and that j refers to elements of J, and write x’ for coordinates relative to F and x’ 
for coordinates relative to G. That’s the way it was done in all the old-time [geometry] and physics papers, 
and is still largely the way physicists go about writing coordinates. Be that as it may, the notation has it’s 
problem and is the subject of long-standing controversy, named by mathematicians the debauche of indices. 
The problem is that the notation employs the same symbol, namely x, to refer to two different objects, 
namely a map with{domain| J and another map with domain J. In practice, ambiguity is avoided because 
the old-time notation never refers to the coframe (or indeed any tensor) without also writing the indices. 
This is the classical way of the dummy variable, a cousin to the f(x) notation. It creates some confusion for 
beginners, but with a little practice it’s a perfectly serviceable and useful way to communicate. 
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We identify of F-! and x with the above column-vector. This is quite natural] because all of 
these objects are in natural correspondence with a K-valued functions of two arguments, 


UxI—-K, 


that maps an [abstract vector. u € U and an lindex|i € I to a/scalar| x’ (u), called the i 


component of u relative to the reference frame F. 


Change of frame. Given two coordinate frames F : J — U and G : J — U, one can 
easily show that J and J must have the same [cardinality] Letting x’, i € I and yf, j € J 
denote the coordinates functions relative to F and G, respectively, we define the transition 
matrix from F to G to be the [matrix] 


M:IxJ—-K 


with entries l l 
Mi = y’ (F;), tel, j ed. 


An equivalent] description of the transition matrix is given by 


yÍ = Sx for all j € J. 


wel 


It is also the custom to regard the elements of J as indexing the\columns| of the matrix, while 
the elements of J label the rows} Thus, for J = (1,2,...,n) and J = (1,2,...,7), we can 
write _ - 
Mi ... ME y! 
M? ... MË? y” 
In this way we can describe the [relation] between coordinates relative to the two frames in 
terms) of ordinary matrix multiplication, To wit, we can write 


y! mi... Mi x! 


y” M? cae MEJ (x° 


Notes. The term frame is often used to refer to objects that should properly be called a 
The latter can be thought of as a field of frames, or functions taking values 
in the space of all frames, and are fully described elsewhere. The confusion in terminology 
is unfortunate but quite common, and is related to the questionable practice of using the 
word scalar when referring to a scalar field (a.k.a. scalar-valued functions) and using the 


word vector when referring to a vector fiela 
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We also mention that in the world of theoretical physics, the preferred terminology seems to 
be polyad and related specializations, rather than frame. Most commonly used are dyad, for 
a frame of two elements, and tetrad for a frame of four elements. 
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219.8 linear combination 


If v,,...,U, is a collection of vectors) in a vector space V, then a linear combination of the 


vu; € V is a vector wu of the form 
nm 
q= ) A,;U; 
i=1 


for any a; which are elements of a[field F. Thus if Ù = a10, + ag¥2, where (a1, a2) € R, then 
w is a linear combination of vi and Up. 
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219.9 linear independence 


Let V be a [vector space] over afield] F. Then for [scalars] 1, A2, ..., An € F the [vectors 
Ui, U2, ..., Un E V are linearly independent if the following condtion holds: 
ALU} + A202 H o + AnUn, = 0 implies Ài A2 sau An 0 


Otherwise, if this conditions fails, the vectors are said to be linearly dependent. Furthermore, 


an infinite set|of vectors is linearly independent if all are linearly independent. 


In the case of two vectors, linear independence means that one of these vectors is not a scalar 
multiple of the other. 


As an alternate characterization of dependence, we have that a set of of vectors is linearly 
dependent if and only if some vector in the set lies in the linear \spanj of the other vectors in 
the set. 
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219.10 list vector 


Let K be afield and n a positive {natural number! We define K” to be the set of all 
from thelindex\list (1,2,...,) to K. Such a mapping a € K” is just a formal way of speaking 
of a list of field elements a',...,a” € K. 
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The above description is somewhat restrictive. A more flexible definition of a list vector is 
the following. Let I be a [finite] list of indices [4 I = (1,...,n) is one such possibility, and let 
K? denote the set of all mappings from J to K. A list vector, an element of K’, is just such 
a mapping. Conventionally, superscripts are used to denote the values of a list vector, i.e. 
for u € K! andi € I, we write u’ instead of u(i). 


We add and scale list vectors point-wise, i.e. for u, KK’ and k € K, we define u + €K? and 
ku € K’, respectively by 


(u+v)' = ut+v'’, icl, 
(ku)’ = ku, ie. 


We also have the [zero vector|0 € K}, namely the/constant mapping 
oò =0, icl. 


The above operations give K! the |structurel of an (abstract) [vector space] over K. 


Long-standing traditions of linear algebra| hold that elements of K? be regarded as column 
vectors. For example, we write a € K” as 


Row vectors are usually taken to/represents linear forms|on KŻ. In other words, row vectors 
are elements of the {dual space) (K‘)". The components) of a row vector are customarily 


written with subscripts, rather than superscripts. Thus, we express a row vector a € (IKK")* 
as 


a = (a1, 2025 Oy) 
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219.11 nullity 


The nullity of a linear mapping) is the of the For a linear 
mapping T : V — W, the nullity of T gives the number of solutions to 


the equation 
T(v)=0, ve. 


The nullity is zero if and only if the linear mapping in question is {injective} 
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4 Distinct index sets are often used when working with multiple [frames] of reference. 
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219.12 orthonormal basis 


orthonormal basis 


Let X be an [inner product space) over a lfield| F and {ra}aey C X be a set of orthonormal) 
lvectors|in the space. If we can write any vector in our space as the sum of vectors from the set 
multiplied by elements of the field, or in symbols Vz € X : d{dafaey C F : x = J oeg data 
then we say that {£a} form an orthonormal basis for X. 
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219.13 physical vector 


Definition. Let £ be a collection] of labels a € £. For each of labels (a, 8) € 
Lx L let MÊ, be a non-singular n x n{matrix\ the collection of all such satisfying the following 
functor-like consistency conditions: 


e For alla € £, the matrix MS, is the identity matrix 


e For all a, G,7 E€ L we have 
Ma = M’; Wia 


where the product in the right-hand side is just ordinary matrix multiplication 


We then impose an [equivalence relation|by stipulating that for all a, € £ and u € R”, the 
pair (a, u) is to the pair (8, M%,u). Finally, we define a physical vector to be an 
of such pairs relative to the just-defined [relation] 


The idea behind this definition is that the a € £ are labels of various \coordinate systems, 


and that the matrices M, encode the corresponding|changes of coordinates} For label a € £ 
and list-vector u € R” we think of the pair (a, u) as the of a physical vector 
relative to the coordinate system a. 


Discussion. All scientific disciplines have a need for formalization. However, the extent to 
which rigour is pursued varies from one discipline to the next. Physicists and engineers are 
more likely to regard mathematics as a tool for modeling and prediction. As such they are 
likely to blur the distinction between [list vectors) and physical vectors. Consider, for example 
the following excerpt from R. Feynman’s “Lectures on physics” [I] 


All quantities that have a direction, like a step in space, are called A 
vector is three numbers. In|order|to/represent| a step in space, ..., we really need 
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three numbers, but we are going to invent a single mathematical symbol, r, which 
is unlike any other mathematical symbols we have so far used. It is not a single 
number, it represents three numbers: x, y, and z. It means three numbers, but 
not only those three numbers, because if we were to use a different coordinate 
system, the three numbers would be changed to 2’, y’, and z’. However, we 
want to keep our mathematics simple and so we are going to use the same mark 
to represent the three numbers (x,y,z) and the three numbers (2’, y’, z’). That 
is, we use the same mark to represent the first set of three numbers for one 
coordinate system, but the second set of three numbers if we are using the other 
coordinate system. This has the advantage that when we change the coordinate 
system, we do not have to change the letters of our equations. 


Surely you are joking Mr. Feynman!? What are we supposed to make of this definition? We 
learn that a vector is both a physical quantity, and a list of numbers. However we also learn 
that it is not really a specific list of numbers, but rather any of a number of possible lists. 
Furthermore, the choice of which list is being used is dependent on the context (choice of 
coordinate system), but this is not really important because we just end up using the same 
symbol r regardless. 


What a muddle! Even at the informal Jevel| one can do better than Feynman. The central 
weakness of his definition is that he is unwilling to distinguish between physical vectors 
(quantities) and their representation (lists of numbers). Here is an alternative physical 
definition from a book by R. Aris on fluid mechanics f]. 


There are many physical quantities with which only a single magnitude can be 
associated. For example, when suitable of mass and length have been 
adopted the [density] of a fluid may be measured. ... There are other quantities 
associated with a point that have not only a magnitude but also a direction. If 
alforcel of 1 lb weight is said to act at a certain point, we can still ask in what 
direction the force acts and it is not fully specified until this direction is given. 
Such a physical quantity is a vector. ... We distinguish therefore between the 
vector as an entity and its which allow us to reconstruct it in a 
particular system of reference. The set of components is meaningless unless the 
system of reference is also prescribed, just as the magnitude 62.427 is meaningless 
as a density until the units are also prescribed. .... 


Definition. A Cartesian vector, a, in three is a quantity with three 
components @1,d2,a3 in the [frame] of reference 0123, which, under rotation of 


the coordinate frame to O123, become components Gj, G2, 43, where 


Qj = Lae: 

The vector a is to be regarded as an entity, just as the physical quantity it 
represents is an entity. It is sometimes convenient to use the bold face a to show 
this. In any particular coordinate system it has components a, a2, a3, and it is 
at other times convenient to use the typical component a;. 
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Here we see a carefully drawn distinction between physical quantities and the numerical 
measurements that represent them. A system of measurement, i.e. a choice of units and 
or coordinate axes, turns physical quantities into numbers. However the correspondence is 
not [fixed], but varies according to the choice of measurement system. This point of view can 
be formalized by representing physical vectors as labeled list vectors, the label specifying a 
choice of measurement systems. The actual vector is then defined to be an equivalence class 
of such labeled list vectors. 
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219.14 proof of rank-nullity theorem 


Let T : V — W be allinear mapping| with V finite-dimensional, We wish to show that 


dim V = dim Ker T + dim Im T 


The [images] of a lbasis! of V will ImT, and hence Im T is finite-dimensional. Choose 
then a basis w1,...,W, of Im T and choose U1,-++;Un E U such that 


wi = T(v;), i=1...n 


Choose a basis u1, ..., ug of Ker T. The result will follow once we show that u1, ..., Uk, V1,---5Un 
is a basis of V. 


Let v € V be given. Since T(v) € ImT, by definition, we can choose [scalars] b4, . . . , bn such 
that 
T(v) = byw) eee bpWn- 


Linearity of T now implies that T(b,u; +... +bnUn — v) = 0, and hence we can choose scalars 
a1,- .-, a, Such that 
bivi +... +bnUn — V = ayy +... akUk. 


Therefore u1,..., Uk, U1, ..., Un Span V. 
Next, let a1,...,@%,01,..., 6, be scalars such that 
aiui +... + akuk + b101 +... + bpp, = 0. 
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By applying T to both sides of this equation it follows that 
biwi +... +6,w, = 0, 


and since w1,...,W, are linearly independent) that 


bi = b2 =... = bn = Q. 
Consequently 
aiui +... + a,uz, = 0 
as well, and since u1, ..., Ug are also assumed to be linearly independent we conclude that 
at ag mee Ak 0 
also. Therefore u1,..., Uk, U1,...,Un are linearly independent, and are therefore a basis. 
Q.E.D. 
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219.15 rank 


The rank of allinear mapping is the dimension] of the |mapping’s/image| For a linear mapping 
T : V — W, the rank of T gives the number of linear constraints on v € V 
imposed by the equation 

T(v) = 0. 


The rank of a linear mapping is equal to the dimension of the [codomainlif and only if the 
mapping in question is [surjective] 


The rank of a linear mapping is equal to the dimension of the if and only if the 
mapping in question is [injective] 
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219.16 rank-nullity theorem 


The sum of thelrankland the ‘nullity of a linear mapping) gives the ldimension|of the mapping’s| 
More precisely, let T : V — W be a linear mapping. If V is a |finite-dimensional, 
then 

dim V = dim Ker T + dim Img T. 


The intuitive content of the Rank-Nullity theorem is the principle that 
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Every independent linear constraint takes away one degree of freedom. 


The rank is just the number of independent linear constraints on v € V imposed by the 
equation 
T(v) = 0: 


The dimension of V is the number of unconstrained degrees of freedom. The nullity is the 
degrees of freedom in the resulting space of solutions. To put it yet another way: 


The number of variables minus the number of independent linear constraints 


equals the number of linearly independent solutions. 
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219.17 similar matrix 


Definition A square matrix A is similar to a square matrix B if there exists a nonsingular 


square matrix S such that 
A = SBS. (219.17.1) 


Because given S we can define R = S~! and have A = RBR“, whether the comes 
first or last does not matter. 


(Transformations) of the form S~'BS (or SBS~*) are called similarity transformations. 


Discussion Similarity is useful for turning recalcitrant into pliant ones. The 
canonical example is that a diagonalizable matrix A is similar to the [diagonal matrix’ of its 


A, with the matrix of its acting as the similarity transformation. 
That is, 


A=TAT™ (219.17.2) 


=[vi vo... vn] ] 9 Az =e) | fv veo. va] (219.17.3) 


This follows directly from the equation defining eigenvalues and eigenvectors, 


AT=TA. (219.17.4) 
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Through this transformation, we’ve turned A into the product of two 


and a diagonal matrix. This can be very useful. As an application, see the solution for the 


normalizing constanti of a multidimensional Gaussian integra 


Properties of similar matrices 


1. Similarity is reflexive. All square matrices A are similar to themselves via the similarity 
transformation A = I~'AI, where I is the identity matrix. 


2. Similarity is symmetric. If A is similar to B, then B is similar to A, as we can define 
a matrix R = S~! and have 
B=R "AR (219.17.5) 


3. Similarity is transitive. If A is similar to B which is similar to C, we have 


A = SBS = (SR 'C(RS) = (RS)~'C(RS). (219.17.6) 


4. Because of 1, 2 and 3, similarity defines an equivalence relation (reflexive, transitive 
and symmetric) on square matrices, partitioning the space of such matrices into a 
disjoint set of equivalence classes. 


5. If A is similar to B, then their determinants are equal, |A| = |B|. This is easily 
verified: 


|A| = |S“"BS| = |S~"||B||S] = |S|*|B||S| = |B]. (219.17.7) 
6. Similar matrices represent the same linear transformation after a change of basis. 
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219.18 span 


The span of a set of {vectors 0, ... , v, is the set of linear combinations] a,v; +---+a,0,,. It’s 
denoted Sp(t;,...,%,). The standard [basis] vectors 1 and j span R? because every vector of 
R? can be represented as a linear combination of 7 and j. Sp(vj,...,T,) is a [subspace] of R” 
and is the smallest subspace containing Uj,..., Un- 

Span is both a noun and a verb; a set of vectors can span afvector space, and a vector can 
be in the span of a set of vectors. 

Checking span: To see whether a vector is in the span of other vectors, one can set up 
an augmented since if @ is in the span of vı, U2, then Ü = x0; + roto. This is a 
Thus, if it has a solution, w is in the span of v1, U2. Note that 
the solution does not have to be unique for w to be in the span. 
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To see whether a set of vectors spans a vector space, you need to check that there are at 


least as many linearly independent) vectors as the dimension of the space. For example, it 


can be shown that in R”, n + 1 vectors are never linearly independent, and n — 1 vectors 
never span. 


Version: 6 Owner: slider142 Author(s): slider142 


219.19 theorem for the direct sum of finite dimensional 
vector spaces 


Theorem Let S and T be |subspaces| of a finite dimensional||vector space] V. Then V is the 
ldirect_sumlof S and T, i.e., V = S@T if and only if dim V = dim S+dim T and SNT = {0}. 


Proof. Suppose that V = S $ T. Then, by definition, V = S +T and SAT = {0}. The 
dimension theorem for subspaccs)states tha 
dim(S + T) + dim S(T = dim S +dimT. 
Since the \dimension!| of the [zero vector space] {0} is zero, we have that 
dim V = dim S + dim T, 


and the first direction of the claim follows. 


For the other direction, suppose dim V = dim S + dimT and SAT = {0}. Then the 
dimension theorem theorem for subspaces imply that 

dim(S + T) = dim V. 
Now S+T is a subspace of V with the same dimension as V so, by Theorem 1 on this page 


V = S +T. This proves the second direction. O 
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219.20 vector 


Overview. The word vector has several distinct, but interrelated meanings. The present 
entry is an overview and discussion of these concepts, with [links] at the end to more detailed 
definitions. 


eA (follow the link to the formal definition) is a list of numbers 
Most commonly, the vector is composed of in which case a list vector is 


5 [infinite] vectors arise in areas such as [functional] analysis and quantum mechanics, but require a much 
more complicated and sophisticated |theor 
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just an element of R”. Complex numbers are also quite common, and then we speak 
of a complex vector, an element of C”. Lists of ones and zeroes are also utilized, and 


are referred to as [binary] vectors. More generally, one can use any field) K, in which 
case a list vector is just an element of K”. 


A (follow the link to a formal definition and in-depth discussion) 


is a geometric quantity that correspond to a linear displacement. Indeed, it is cus- 
tomary to depict a physical vector as an jarrow) By choosing a system of 
a physical vector v, can be represented by a list vector (v',...,v")7. Physically, no 
single system of measurement cannot be preferred to any other, and therefore such a 


representation|is not canonical. A linear change of coordinates induces a corresponding 
linear transformation! of the representing list vector. 


In most physical applications vectors have a magnitude as well as a direction, and then 
we speak of aleuclidean vector! When lengths and angles can be measured, it is most 
convenient to utilize an [orthogonal] system of coordinates. In this case, the magnitude 
of a Euclidean vector v is given by the usual Euclidean norm of the corresponding list 


vector, 


Ivi = y X0)? - 
This definition is independent of the choice of orthogonal coordinates. 


An abstract vector is an element of a|vector space| An abstract Euclidean vector 


is an element of an inner product space, The connection) between list vectors and the 
more general abstract vectors is fully described in the entry on frames! 


Essentially, given a finite dimensional abstract vector space, a choice of a coordinate 
(which is really the same thing as a/basis) sets up a linear [bijection] between the 
abstract vectors and list vectors, and makes it possible to represent| the one in [terms] 
of the other. The representation is not canonical, but depends on the choice of frame. 


A change of frame changes the representing list vectors by a matrix multiplication 


We also note that the{axioms| of a vector space make no mention of lengths and angles. 
The vector space formalism can be enriched to include these notions. The result is the 


axiom system for inner products 


Why do we bother with the “bare-bones” formalism of length-less vectors? The reason 
is that some applications involve velocity-like quantities, but lack a meaningful notion 
of speed. As an example, consider a multi-particle system. The|state| of the system is 
represented as a point in some|manifold| and the evolution of the system is represented 


by velocity vectors that live in that manifold’s tangent space, We can superimpose 
and scale these velocities, but it is meaningless to speak of a speed of the evolution. 


Discussion. What is a vector? This question is surprisingly difficult to answer. 
Vectors are an essential scientific concept, indispensable for both the physicist and the 
mathematicians. It is strange then, that despite the obvious importance, there is no clear, 
universally accepted definition of this term. 
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The difficulty is one of semantics! The term vector is ambiguous, but its various meanings 
are interrelated. The different usages of vector call for different formal definitions, which 
are similarly interrelated. List vectors are the most elementary and familiar kind of vectors. 
They are easy to define, and are mathematically precise. However, saying that a vector is 
just a list of numbers leads to conceptual difficulties. 


A physicist needs to be able to say that velocities, fluxes are vectors. A geometer, 
and for that matter a pilot, will think of a vector as a kind of spatial displacement. Ev- 
eryone would agree that a choice of a vector involves multiple degrees of freedom, and that 
vectors can linearly superimposed. This description of “vector” evokes useful and intuitive 
understanding, but is difficult to formalize. 


The synthesis of these conflicting viewpoints is the modern mathematical notion of a vector 
space. The key innovation of modern, formal mathematics is the pursuit of generality by 
means of abstraction. To that end, we do not give an answer to “What is a vector?”, 
but rather give a list of properties] enjoyed by all objects that one may reasonably term a 
“vector”. These properties are just the axioms of an abstract vector space, or as Forrest 
Gump[I] might have put it, “A vector is as a vector does.” 


The axiomatic approach afforded by vector space theory gives us maximum flexibility. We 
can carry out an analysis of various physical vector spaces by employing [propositions] based 
on vector space axioms, or we can choose a basis and perform the same analysis using list 
vectors. This flexibility is obtained by means of abstraction. We are not obliged to say what 
a vector is; all we have to do is say that these abstract vectors enjoy certain properties, 
and make the appropriate deductions. This is similar to the idea of an abstract in 
object-oriented programming. 


Surprisingly, the idea that a vector is an element of an abstract vector space has not made 
great inroads in the physical sciences and engineering. The stumbling block] seems to be a 
poor understanding of formal, deductive mathematics and the unstated, but implicit attitude 
that 


formal manipulation of a physical quantity requires that it be represented by one 
or more numbers. 


Great historical irony is at work here. The classical, Greek approach to|geometry| was purely 
synthetic, based on idealized notions like point and line, and on various axioms. 
geometry, a la arose much later, but became the mode of thought 
in scientific applications and largely overshadowed the synthetic method. The pendulum 
began to swing back at the end of the nineteenth century as mathematics became more 
formal and important new axiomatic systems, such as vector spaces, fields, and 
were developed. The cost of increased abstraction in modern mathematics was more than 
justified by the improvement in clarity and organization of mathematical knowledge. 


Alas, to a large extent physical science and engineering continue to dwell in the 19' century. 
The axioms and the formal theory of vector spaces allow one to manipulate formal geomet- 
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ric entities, such as physical vectors, without first turning everything into numbers. The 
increased [level] of abstraction, however, poses a formidable obstacle toward the acceptance 
of this approach. Indeed, mainstream physicists and engineers do not seem in any great 
hurry to accept the definition of vector as something that dwells in a vector space. Until this 
attitude changes, vector will retain the ambiguous meaning of being both a list numbers, 
and a physical quantity that transforms with respect to matrix multiplication. 
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Chapter 220 


15A04 — Linear transformations, 
semilinear transformations 


220.1 admissibility 


Let k be a field, V a [vector space| over k, and T a over V. We say that a 
subspace] W of V is T-admissible if 


1. W is a T - [invariant subspace 
2. If f € k[X] and f(T)x € W, there is a[vector]y € W such that f(T)x = f(T)y 
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220.2 conductor of a vector 


Let k be afield) V a [vector space) T : V — V a [linear transformation| and W a T - 


of V. Let x € V. The T - conductor of x in W is the set S7p(xz,W) 
containing all polynomials|g € k[X] such that g(T)x € W. It happens to be that this set 
is an \ideal| of the polynomial ring. We also use the [term] T - conductor of x in W to refer 


to the |generator| of such ideal. In the special case W = {0}, the T - conductor is called T - 
annihilator of x. Another way to define the T - conductor of x in W is by saying that it 


is a [monic polynomial] p of lowest \degree| such that p(T)z € W. Of course this polynomial 
happens to be unique. So the J-annihilator of x is the monic polynomial m, of lowest degree 
such that m,(T)a = 0. 
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220.3 cyclic decomposition theorem 


Let k be afield) V a finite dimensional) vector space] over k and T a over V. 
Let Wo be a proper T - admissible|subspace| of V. There are non [zero vectors) x1, ..., £y in V 


with respective P1, ++, Pr such that 

1. V = W @ Z(z1, T) @...® Z(z,,T) (See the \cyclic subspace} definition) 

2. pp \divides] p,_; for every k = 2, ...,r 
Moreover, the integer|r and the minimals pı, ...,p, are univocally determined by (1),(2) and 
the fact that none of x, is zero. 


Version: 5 Owner: gumau Author(s): gumau 


220.4 cyclic subspace 


Let k be a field| V alvector spacelover k, andgz € V. Let T : V — V beallinear transformation 


The T-cyclic subspace generated by x is the smallest T - which 
contains] x. 

We note this space by Z (x, T). If Z(x, T) = V we say that « is a|cyclic||vector| of T. 

Note that Z(x,T) = {p(T)(x)/p € k[X]}. 
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220.5 dimension theorem for symplectic complement 
(proof) 


We denote by V* the of V, i.e., linear mappings| from V to R. Moreover, we 
assume known that dim V = dim V* for any [vector space] V. 


We begin by showing that the S: V — V*, a œ> w(a,-) is an linear [isomorphism] 
First, linearity is clear, and since w is non-degenerate} ker S = {0}, so S'is/injective) To show 
that S is|surjective] we apply the to S, which yields dim V = dimim 5. 


We now have im S C V* and dimimS = dim V*. (The first assertion follows directly from 


the definition of S.) Hence im S = V* (see [this page), and S' is a/surjection| We have shown 


that S is a linear isomorphism. 


Let us next define the mapping T : V — W*, at w(a,-). Applying the 
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to T yields 
dimV = dimkerT + dimimT. (220.5.1) 


Now kerT = W” and im7 = W*. To see the latter assertion, first note that from the 
definition of T, we have im T C W*. Since S is a linear isomorphism, we also have im T D 
W*. Then, since dim W = dim W*, the result follows from equation 220.5.1] 
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220.6 dual homomorphism 


Definition. Let U,V be|vector spaces| over alfield|K, and T : U > V bea 


(a linear map) between them. Letting UV,VV denote the corresponding we 
define the dual homomorphism TV : VV — UV, to be the linear mapping with action 


a—>aoT, acV"*. 


We can also characterize T* as the adjoint of T relative to the natural] evaluation bracket 
between and 


(-,-)y :UV xU >K, (a,u) =a(u), aEUV, ue. 
To be more precise TV is characterized by the condition 


(T*a,u), = (a,Tuy, aweVV, wed. 


If U and V are [finite dimensional, we can also characterize the dualizing operation as the 
composition of the following canonical isomorphisms; 


Hom(U, V) —> U v @V = (V*)* @ U* = Hom(V*, U*). 


Category theory perspective. The dualizing operation behaves contravariantly with 
respect to composition, i.e. 


(SoT)* =T VoS\, 


for all vectir space homomorphisms S, T with suitably matched |domains| Furthermore, the 
dual of the homomorphism is the identity homomorphism of the dual space. Thus, 


using the language] of the dualizing operation can be characterized as the 
homomorphism action of the contravariant, dual-space [functor] 
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Relation to the matrix transpose. The above closely mirror the 
properties of the operation. Indeed, T* is sometimes referred to as the 


transpose of T, because at the |level| of matrices the dual homomorphism is calculated by 
taking the transpose. 


To be more precise, suppose that U and V are finite-dimensional| and let M € Matnm(K) 
be the matrix of T relative to some of U and V. Then, the dual homomorphism 
TV is represented as the transposed matrix MT € Matm,n(K) relative to the corresponding 
dual bases of UV, VV. 
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220.7 dual homomorphism of the derivative 


Let Pa denote the|vector space]of reall polynomials) of degree|n or less, and let Dn : Pn — Pn_1 
denote the ordinary \derivative| Linear forms|on P,, can be given inltermslof evaluations, and 
so we introduce the following notation. For every |scalar|k € R, let By”) E€ (P,,)V denote the 
evaluation 


By” :preplk), pePry. 
Note: the degree superscript matters! For example: 


Ev” =2 Ev?) — Evo) 


whereas Ev\,”, Ev\”, Ev” are Let us consider the/dual homomorphism] 
DoV, i.e. the adjoint of Də. We have the following relations} 


Do V (Ev?) = —3 Ev? +2Ev? —-1 Ev?) 
Də V Ev = =i +1 Ev?) 


In other words, taking Ev), Ey) as the basis) of (P,)V and Ev?) Ev?) Ev?” as the basis 
of (P2)V, the [matrix] that [represents] D2V is just 


NIE PONWIw 
NIF Onie 


Note the contravariant relationship between Dj and D2.V. The former turns second degree 
polynomials into first degree polynomials, where as the latter turns first degree evaluations 
into second degree evaluations. The matrix of DV has 2 and 3 (rows) precisely 


because D2V is a homomorphism from a 2-dimensional vector space to a 3-dimensional 


vector space. 


By contrast, Də will be represented by a 2 x 3 matrix. The of P; is 
—-z+1, zx 
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and the dual basis of Po is 


1 1 
z — 1)(x— 2), æ(2-— zr), zC — 1). 


Relative to these Də is represented by the transpose) of the matrix for D2V, namely 


in 
Wl] ow 
Oo bv 
NIe | 
NIK 
NV LY 


This corresponds to the following three relations: 


Do [He 1)(@-2)] = -3(-r+1) -4s 
Də [x(2 — x)|] = 2(—xr+1) +02 
Dy [x(x — 1)| = -}(-z+1) +r 
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220.8 image of a linear transformation 


Definition Let T : V — W be allinear transformation| Then the image of T is the set 
Im(T) = {w € W | w =T(v) for somev € V}. 


Properties 


1. T is a|surjection| if and only if Im(T) = W. 
2. Im(T) e auspacd| of W 


3. If V and W are then the of Im(T) is given by the 
rank-nullity theorem 
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220.9 invertible linear transformation 
An invertible linear transformation is a T : V — W which is a 


If V and W arelfinite dimensional, the linear transformation T is invertible if and only if the 
[matrixlof T is not 
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220.10 kernel of a linear transformation 
Let T: V — W be alinear transformation) The set of all [vectorslin V that T maps] to 0 is 
called the kernel (or nullspace) of T, and is denoted ker T. 

ker T={2eEV | T(2)=0}. 


The kernel is a of V, and its dimension is called the of T. Note that 
T is linjective] if and only if ker T = {0}. 


Suppose T is as above, and U is a vector subspace of V. Then, for the [restriction] T|y, we 
have 
ker T|y = U (ker T. 


When the ltransformationslare given by means of the kernel of the matrix A is 


ker A={zx EV |Ax=0O}. 
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220.11 linear transformation 


Let V and W be vector spaces) over the same field! F. A linear transformation is a function! 
T : V — W such that: 

e T(v +w) =T(w)+T(w) for all v, w € V 

e T(Av) = AT (v) for all v € V, and A € F 


e T(0)=0. 


e f G: W — U are linear transformations then GoT : V — U is also a linear 
transformation. 


The kernellker(T) = {v € V | T(v) = 0} is alsubspace|of V. 

e The [image] Im(T) = {T (v) | v € V} is a subspace of W. 

The [inverse image|7~'(w) is a subspace if and only if w = 0. 
A linear transformation is {injective if and only if ker(T) = {0}. 
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e Ifv eV then T-'T(v) =v + u where u is any element of ker(T). 


e If T is a\surjection| and w € W then TT~'(w) = w. 


See also: 
e Wikipedia, 
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220.12 minimal polynomial (endomorphism) 


Let T be an endomorphism of an n-dimensional |vector space V. 


We say that the[minimal polynomial] Mr (X), is the unique(monfe polynomialjoffminimal degre 
such that Mr(T) = 0. In other words the matrix! representing Mr(T) is the zero matrix. 


We say that P(X) is a zero [polynomial for T if P(T) is the zero endomorphism. 


Firstly, End(V) is a vector space ofldimensionln?. Therefore the n2+1{vectors] iy, T, T?,... T”, 
are linearly dependant. So there are coefficients, a; not all zero such that yan ar = 0. 
We conclude that a non-trivial zero polynomial for T exists. We take Mr(X) to be a zero 
polynomial for T of minimal degree with leading coefficient one. 


Suppose P(X) is a zero polynomial for T then M7(X) | P(X). 


Proof: 


By the division algorithm for polynomials, P(X) = Q(X)Mr(X) + R(X) with degR < 
degMr. We note that R(X) is also a zero polynomial for T and by minimality of Mr(X), 
must be just 0. Thus we have shown Mr(X) | P(X). 


Now suppose P(X) was also both minimal and monic, then we have Mr(X) = P(X), which 
gives uniqueness. 


The minimal polynomial has a number of interesting 


1. The {roots| are exactly the [eigenvalues] of the endomorphism 


2. If the minimal polynomial of T splits into linear then T is upper-triangular 
with respect to some 


3. If the minimal polynomial of T splits into linear factors each of 1 if and only if 
T is diagonal with respect to some basis 
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It is then a simple) corollary of the that every endomorphism 
of a finite dimensional] vector space over C may be upper-triangularized. 


The minimal polynomial is intimately related to the characteristic polynomial] for T. For let 
x7r(X) be the characteristc polynomial. Then as was shown earlier Mr(X) | yr(X). It is 


also a fact that the eigenvalues of T are exactly the roots of yr. So when split into linear 
factors the only between Mr(X) and yr(X) is the algebraic] multiplicity of the 


roots. 


Version: 4 Owner: vitriol Author(s): vitriol 


220.13 symplectic complement 


Definition [I] 2] Let (V,w) be alsymplectic vector space|and let W be a|vector subspace] of 
V. Then the symplectic complement of W is 
W” = {x € V | w(x, y) = 0 for all y e W}. 


It is easy to see that W® is also a vector subspace of V. Depending on the [relation] between 
W and W”, W is given different mames 


1. If W c W”, then W is an isotropic subspace (of V). 
2. If W* c W, then W is an coisotropic subspace. 
3. If W()W* = {0}, then W is an symplectic subspace. 


4. If W = W”, then W is an Lagrangian subspace. 


For the symplectic complement, we have the following [dimension] theorem. 


Theorem [IB] Let (V,w) be a symplectic vector space, and let W be a vector subspace of 
V. Then 
dim V = dim W” + dim W. 


REFERENCES 
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220.14 trace 


The trace Tr(A) of a|square matrix] A is defined to be the sum of the diagonal entries of A. 
Key formulas] for the trace [operator] 


e Tr(A + B) = Tr(A) + Tr(B) 
e Tr(AB) = Tr(BA) 
The trace Tr(T) of a [Nen transformation? : V — V from any finite dimensional Wector spacg 


V to itself is defined to be the trace of any of T with respect to albasis| 
of V. This |scalar] is independent) of the choice of basis of V, and in fact is equal to the sum 


of the of T (over a splitting field of the characteristic polynomial), including 


multiplicities. 
The following link! presents some examples for calculating the 
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Chapter 221 


15A06 — Linear equations 


221.1 Gaussian elimination 


Gaussian elimination is used to solve a system of linear equations 


aâ11%1 + A122 + 
Q211 + A222 + 


Ani£ı + QAp2ťə2 + 


or equivalently Ax = b, where 


Q11 Q12 

Q21 Q22 
A=|. 

Ani n2 


T lmm = by 
T Amm = by 
T Anmtim = bn 

Alm 

Am 

Anm 


is a given n x m|matrixlof coefficients (elements of some [field] K - usually 


applications), where 


is a given element of K” and where 


Tm 


is the solution — some unknown element of K”. 
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R or C in physical 


The method consists of combining the coefficient matrix A with the right hand side b to get 
the “augmented” n x (m + 1) matrix 


Q11 12 Aim bı 

Q21 Q22 Am be 
[A b] = . 

Ani An2 Anm bn 


A pequencejof elementary row] operations is then applied to this matrix so as to transform it 


to The allowed operations are: 


e multiply a row by a nonzero scalar! c, 
e swap two rows 


e add c times one row to another one. 


Note that these operations are “legal” because x is a solution of the transformed system if 
and only if it is a solution of the initial system. 


If the number of equations equals the number of variables (m = n), and if the coefficient 
matrix A isnon-singular| then the algorithm will terminate when the augmented matrix has 
the following form: 


f t / 
ai Ayn ++ Ay, Oy 
t 1 1 
O Ay +++ am b 
/ / 
0 0 > a b, 


With these assumptions, there exists a unique solution, which can be obtained from the 
above matrix by back substitution. 


For the general case, the termination procedure is somewhat more complicated. First recall 
that a matrix is in echelon form if each row has more leading zeros than the rows above it. 
A pivot is the leading non-zero entry of some row. We then have 


e If there is a pivot in the last the system is [inconsistent] ; there will be no 


solutions. 


e If that is not the case, then the general solution will have d degrees of freedom, where 
d is the number of columns from 1 to m that have no pivot. To be more precise, 
the general solution will have the form of one particular solution plus an arbitrary 


inear combination) of d linearly independent elements of K”. 
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In even more prosaic |language| the variables in the non-pivot columns are to be consid- 
ered “free variables” and should be “moved” to the right-hand side of the equation. The 
general solution is then obtained by arbitrarily choosing values of thelfree variables, and 
then solving for the remaining “non-free” variables that reside in the pivot columns. 


A variant of Gaussian elimination is Gauss-Jordan elimination. In this variation we reduce to 
echelon form, and then if the system proves to be continue to apply the elementary 
row operations until the augmented matrix is in reduced echelon form. This means that not 
only does each pivot have all zeroes below it, but that each pivot also has all zeroes above 
it. 


In essence, Gauss-Jordan elimination performs the back substitution; the values of the un- 
knowns can be read off directly from the terminal augmented matrix. Not surprisingly, 
Gauss-Jordan elimination is slower than Gaussian elimination. It is useful, however, for 
solving systems on paper. 
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221.2 finite-dimensional linear problem 


Let L:U — V bea and v € V an element of the codomain, When both 
the domain) U and codomain V are finite-dimensional| a 


L(u) =v 


can be solved by applying the [Gaussian elimination] algorithm (a.k.a row reduction). To do 
so, we need to have U1,...,Un and v1, ..., Um of U and V, respectively, so that will can 


somehow determine a matrix, and a column vector 


Mt M} ... M2 b! 
a [M M oM | 
M? M? ... M?” b™ 


that serve to|represent| of L and v, relative to the chosen bases, i.e. 


L(uj) = Mju, + M3vg +... + Mm, i=1,...,n 
v = bly, + bva +... bUm. 


991 


We are then able to re-express our linear equation as the following problem: find all n-tuples 


of {scalars] z',..., 2” such that 


Mt! rt + M}r? +... +M} r” =b 
M? r! + M3a2?+...+M22"=b? 


M? r! + Mea? ++... +M? 2” = b” 


or quite simply as the matrix-vector equation 
Mx = b, 
where x is the n-place column vector of unknowns. 


Note that the of the domain is the number of variables, while the dimension 
of the codomain is the number of equations. The equation is called or 
depending on whether the former is greater than the latter, or vice versa. 
In general, over-determined systems arelinconsistent], while under-determined ones have mul- 
tiple solutions. However, this is a “rule of thumb” only, and exceptions are not hard to find. 
A full understanding of consistency, and multiple solutions relies on the notions of 


and is described by the 


Notes. Elementary applications [focus] exclusively on the coefficient matrix and the right- 
hand and neglect to mention the underlying linear mapping. This is unfortunate, 
because the concept of a linear equation is much more general than the traditional notion 
of “variables and equations”, and relies in an essential way on the idea of a linear mapping. 


The attached example) regarding interpolation is a case in point. Polynomial 
interpolation is a linear problem, but one that is specified abstractly, rather than in [terms] 


of variables and equations. 
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221.3 homogeneous linear problem 


Let Lb: U — V bea A linear equation) is called homogeneous if it has the 


form 


L(u)=0, ue. 


A homogeneous linear problem always has a trivial solution, namely u = 0. The key issue 
in homogeneous problems is, therefore, the question of the existence of non-trivial solutions, 
i.e. whether or not the {kernel] of L is trivial, or equivalently, whether or not L is 
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221.4 linear problem 


Let L : U — V be a linear mapping, and v € V an element of the A linear 


equation , a.k.a a linear problem, is the constraint 
E(w) =v, 


imposed upon elements of the domain| u € U. The solution of a linear equation is the set of 
u € U that [satisfy] the above constraint, i.e. the pre-image L~'(v). The equation is called 
inconsistent if no solutions exist, i.e. if the pre-image is the empty set, It is otherwise called 
consistent. 


The general solution of a linear equation has the form 
U = Up + Un, Up, Up E U, 


where 
L(up) =v 


is a particular solution and where 
L(un) =0 


is any solution of the corresponding [homogeneous] problem, i.e. an element of the kernel) of 
L. 


Notes. Elementary treatments of linear algebra focus|almost exclusively on finite-dimensional linear proble 


They neglect to mention the underlying mapping, preferring to focus instead on “variables 
and equations.” However, the {Scopej of the general concept is considerably wider, e.g. linear 


such as 
y’ +y=0. 
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221.5 reduced row echelon form 


For a\matrix|to be in reduced row echelon form it has to first the requirements to be 
in row echelon form) and additionally satisfy the following requirements, 


1. The first non-zero element in any TOW] must be 1. 


2. The first element of value 1 in any row must the only non-zero value in it’s [column] 
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An example of a matrix in reduced row echelon form could be, 


0126010 0 4 0 
00001100 1 1 
0000001041 
0000000 12 1 
0000000000 
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221.6 row echelon form 


A [matrixlis said to be in row echelon form when each TOW in the matrix starts with more 
zeros then the previous row. Rows which are composed completely of zeros will be grouped 
at the bottom of the matrix. 


Examples of matrices in row echelon form include, 


If several rows have the same number of leading zeros then the matrix is not in row echelon 


form unless the rows contain! no non-zero values. 
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221.7 under-determined polynomial interpolation 


Consider the following interpolation problem: 


Given £1, Y1, T2, Y2 E R with x, # £2 to determine all cubic polynomials 
p(t) = az? +br?’ +czr+d, z,a,b,c,d E€ R 


such that 
p(x1) = Y, p(x) = Y2. 


This is a linear problem) Let P3 denote the vector space] of cubic polynomials. The under- 
lying linear mapping is the multi-evaluation 


E : P3 > R’, 
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given by 


pro (BENN, p € Ps 


P(x2 


The interpolation problem in question is represented by the equation 


B) = (2) 


where p € ?3 is the unknown. One can recast the problem into the traditional form by 
taking standard [basesl of P; and R? and then seeking all possible a,b,c,d € R such that 


a 
}-6) 
C Y2 
d 


However, it is best to treat this problem at an abstract level, rather than mucking about with 


row reduction, The Lagrange Interpolation formula) gives us a particular solution, namely 


the linear polynomial 


T — Tı T — Xo 


yo, LER 


Yı 
LQ — Tı Tı — T2 


The general solution of our interpolation problem is therefore given as po + q, where q € P3 


is a solution of the [homogeneous] problem 
E(q) = 0. 


A {basis) of solutions for the latter is, {evidently 


a(x) = (x — zı)(x — x2), q(x) = zq (x), reR 


The general solution to our interpolation problem is therefore 


L= Li T — T2 
Yit 
LQ — Tı Tı — T2 


with a,b € R arbitrary. The general [under-determined] interpolation problem is treated in 


an entirely analogous manner. 


plz) = Yo + (ax +b)(x — zı)(x — x2), £ ER, 
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Chapter 222 


15A09 — Matrix inversion, generalized 
inverses 


222.1 matrix adjoint 


The adjoint (or classical adjoint |] or adjugate) A* or adj(A) of a|square matrix] A is 
given by 


where cof ;;(A) denotes the j, ith [cofactor] of A. 
The adjoint is closely related to the [matrix inverse, as 


AA* = det(A)I 
characterizes A* for A invertible. 
222.1.1 Property 
Let A be invertible and let 
p(t) = det(tI — A) = t” — p,(A)t”™* +... + (—1)” det(A) 
1 this [term] is to distinguish this sense from the {conjugate transpose] over the |complexes| sense, which is 


more recent and explored [here] 
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be the characteristic polynomial) of_A, where p;(A), po(A),...pn(A) = det(A) are the funda- 
mental of AB 


From p(A) = 0 (by Cayley’s theorem) we get that 


ACA? =i (AJAY? + cacte(=1)? aA) = (= 1)" det( Ayr 


so we have 


A" = pn-1(A)I — Pn—2(A)A + Pral AJA — +++ + (-1)" pi (A)AP™ + (- 1) AN 
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222.2 matrix inverse 


222.2.1 Basics 


The inverse of an n x n matrix! A is denoted by A~!. The inverse is defined so that 


AA“! = AA = I, 


Where I, is the n x n identity matrix 


It should be stressed that only [square matrices| have inverses proper- however, a matrix of 
any \dimension| may have “left” and “right” inverses (which will not be discussed here). 


A precondition for the existence of the matrix inverse A~! is that det A 4 0 (the determinant! 


is nonzero), the reason for which we will see in a second. 


The general form of the inverse of a matrix A is 


o 1 
— det(A) 


A? ayer 


where A* is the jadjoint| of A. This can be thought of as a generalization of the 2 x 2 formula 
given in the next section. However, due to thelinclusion|of the determinant in the expression, 
it is impractical to actually use this to calculate inverses. 


2 Note that p;(A) = tr(A), the [trace] of A. 
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This general form also explains why the determinant must be nonzero for invertability; as 
we are dividing through by its value. 


222.2.2 Calculating the Inverse 


Method 1: 


An easy way to calculate the inverse of a matrix by hand is to form an augmented matrix 
[A|Z] from A and J, then use {Gaussian elimination] to transform the left half into J. At the 
end of this procedure, the right half of the augmented matrix will be A! (that is, you will 
be left with [Z| A7t]). 


Method 2: 


One can calculate the 7, jth element of the inverse by using the formula 


AD = cof;;(.A)/ det A 


J 


Where cof;;(A) is the i, jth [cofactor expansion| of the matrix A. 


Note that the indices on the left-hand side are swapped relative to the right-hand side. This 
has the effect of the|transpose|which appears in the general form of the inverse in the previous 


section. 
2-by-2 case: 


For the 2 x 2 case, the general formula reduces to a memorable shortcut. For the 2 x 2 matrix 


welts 


The inverse M~! is always 


ala (a a) E a 


Where det M is simply ad — bc. 
Remarks 


Some caveats: computing the matrix inverse for ill-conditioned| matrices is error-prone; spe- 
cial care must be taken and there are sometimes special algorithms to calculate the inverse 


of certain classes of matrices (for example, Hilbert matrices). 
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222.2.3 Avoiding the Inverse and Numerical Calculation 


The need to find the matrix inverse depends on the situation— whether done by hand or by 
computer, and whether the matrix is simply a|part|of some equation or expression or not. 


Instead of computing the matrix AT! as part of an equation or expression, it is nearly always 
better to use a matrix factorization instead. For example, when solving the system Az = b, 
actually calculating A~! to get x = A7~'d is discouraged. \LU-factorization is typically used 
instead. 


We can even use this fact to speed up our calculation of the inverse by itself. We can cast 
the problem as finding X in 


AX =B 


For n x n matrices A, X, and B (where X = A“! and B = In). To solve this, we first find 
the LU [decomposition] of A, then literate! over the columns) solving Ly = Pb, and Ux, = y 


each time (k = 1...n). The resulting values for x; are then the columns of A7t. 


222.2.4 Elements of Invertible Matrices 


Typically the matrix elements are members of a field! when we are speaking of inverses (i.e. 
the [reals| the ‘complex numbers). However, the matrix inverse may exist in the case of the 


elements being members of a provided that the determinant of the matrix 
isa in the 


222.2.5 References 


e Golub and Van Loan, ” Matrix Computations,” Johns Hopkins Univ. Press, 1996. 


e ” Matrix Math”, http://easyweb.easynet.co.uk/ mrmeanie/matrix/matrices. htm 
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Chapter 223 


15A12 — Conditioning of matrices 


223.1 singular 


Ann xn A is called singular if its rows) or are not 
This is [equivalent] to the following conditions: 

e The [determinant] det(A) = 0. 

e The rank of A is less than n. 

e The nullity of A is greater than zero (null(A) > 0). 

e The [homogeneous] linear system Ax = 0 has a non-trivial solution. 


More generally, any of a with a non-trivial 
is a singular transformation. 


Because a singular matrix A has det(A) = 0, it is not invertible. 
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Chapter 224 


15A15 — Determinants, permanents, 
other special matrix functions 


224.1 Cayley-Hamilton theorem 


Let T be a on a V, and let c(t) be the 


characteristic polynomial of T. Then c(T) = To, where To is the zero In 
other words, T satisfies] its own characteristic equation. 
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224.2 Cramer’s rule 


Let Av = b be the matrix] form of a system of n [linear equations| in n unknowns, with x and 
b as n x 1 [column vectors) and A an n x n matrix. If det(A) 4 0, then this system has a 


unique solution, and for each i (1 <i <n), 


_ det(M;) 
“= “Get (A) 


where M; is A with|column|i replaced by b. 


Version: 5 Owner: akrowne Author(s): akrowne 


1001 


224.3 cofactor expansion 


Let M = (M;;) be an n x n [matrix] with entries in some of scalars, Let m;;(M) denote 
the (n — 1) x (n — 1) submatrix obtained by deleting row)i and [column] j of M, and set 


cof;;(M) = (—1)'7 det (m,;(M)). 
The matrices m;;(M) are called the minors of M, and the scalars cof;;(M) the cofactors. 
The usual definition of the 
det(M) = X. sgn(t) Min; Many +++ Mnr, (224.3.1) 


TES 


implies the following of the determinant in of cofactors. For every 
j =1,2,...,n we have 

det (M -5 Mi; cof; (M 

det(M -5 M;i cof;;( 


These lidentitieslare called, respectively, the cofactor expansion of the determinant along the 
j* column, and the j*" row. 


Example. Consider a general 3 x 3 determinant 


a, ag a3 
bı b2 b3| = a1b2C3 + a2b3Cı + a3b1C2 — a1b3C2 — a3b2C1 — a2b1C3. 
Cy CQ C3 


The above can equally well be expressed as a cofactor expansion along the first row: 


a, a2 Q3 
b b by bs by bs by bə 
1 b2 b3| =a — ag + a3 
C2 C3 Ci C3 Ci C2 
Cy Co C3 


= a (b2c3 = b3C2) = a2(b1 C2 = b3c1) + a3(b1C2 = b2c1); 


or along the second column: 


a, ag Q3 

b b bal = by b3 b ay a a, a3 

1 92 03| = —a2 + b2 Tejp p 
Ci C3 Cy 1 93 

Cy Coq C3 


= —a2(b1c3 = b3C1) + bə (a1c3 = a3C1) = C2(a1b3 = azbı); 
or indeed as four other such expansion corresponding to rows 2 and 3, and columns 1 and 3. 
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224.4 determinant 


Overview 


The determinant is an |algebraic| operation that transforms a|square matrix] into a [scalar] 
This operation has many useful and important For example, the determinant 


is zero if and only if the corresponding system of homogeneous) equations is The 


determinant also has important geometric applications, because it describes the area of a 


parallelogram, and more generally the volume of a parallelepiped. 


The notion of determinant predates and Originally, the 
determinant was a number associated to a system of n linear equations in n variables. This 


number ”determined” whether the system was singular; i.e., possessed multiple solutions. 
In this sense, two-by-two determinants were considered by Cardano at the end of the 16th 
century and ones of arbitrary [size] (see the definition below) by Leibniz about 100 years later. 


Definition 


Let M = (M;;) be an n x n matrix with entries in some field] of scalars pi The scalar 


Maii Mie .. Min 

Mn M Mon 
ane: P| = Y sen(t) Min Mang «++ Moan (224.4.1) 
: : : : TES 

Ma Ma --- Man 


is called the determinant of M, or det(M) for short. The m in the above sum varies 


over all the of {1,...,n} G.e., the elements of the symmetric group) Sn.) Hence, 
there are n! [termslin the defining sum of the determinant. The symbol sgn(z) denotes the 


of the permutation; it is +1 according to whether 7 is an even or odd permutation. 


By way of example, the determinant of a 2 x 2 matrix is given by 


There are six permutations of the numbers 1, 2,3, namely 


123, 231, 312, 132, 321, 213: 


the overset sign indicates the permutation’s [signature] Accordingly, the 3 x 3 deterimant is 
a sum of the following 6 terms: 


Mi Miz Mis 
Mo, Mə Mog} = My, Mo2M33 + My2Mo3M31 + Mı3 M21 M32 
Msı Ms M33 —Mıı Mə3 M32 — Mı3 Mə92 M31 — M12 M21 M33 


1 Most scientific and geometric applications deal with matrices made up of [real] or 
However, most properties of the determinant remain valid for matrices with entries in a 
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Remarks and important properties 


1. The determinant operation converts matrix multiplication) into scalar multiplication; 
det(AB) = det(A) det(B), 
where A, B are square matrices of the same size. 


2. The determinant operation is and with respect to the 
matrix’s (rows) and {columns) See the multi-linearity attachment for more details. 


3. The determinant of allower triangular} or an upper triangular matrix is the product of 


the diagonal entries, since all the other summands in (1) are zero. 


4. have the same determinant. To be more precise, let A and X be 
square matrices with X invertible. Then, 


det(X AX7') = det(A). 


In particular, if we let X be the matrix representing a change of this shows that 
the determinant is independant of the basis. The same is true of the|trace of a matrix] 


In fact, the whole characteristic polynomial) of an endomorphism is [definable] without 


using a basis or a matrix, and it turns out that the determinant and [trace] are two of 
its coefficients. 


5. The determinant of a matrix A is zero if and only if A is singular; that is, if there 
exists a non-trivial solution to the homogeneous equation 


Ax = 0. 


6. The |transpose| operation does not change the determinant: 


det A’ = det A. 


7. The determinant of a diagonalizable is equal to the product of its 
counted with multiplicities. 


8. The determinant is homogeneous of degree) n. This means that 


det(kM) =k" det M, kis a scalar. 
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224.5 determinant as a multilinear mapping 


Let M = (M;;j) be an n x n matrix] with entries in a K. The matrix M is really the 
same thing as a list of n [column vectors) of n. Consequently, the [determinant] operation 
may be regarded as a 


n times 
OO 
det: K” x ...xX K" — K 


The determinant of a matrix M is then defined to be det(M1,..., Mn), where M; € K” 
denotes the j* [column] of M. 


Starting with the definition 


det(My,...,Mn) = XC sgn()Mim Mor +++ Marn (224.5.1) 


TESn, 


the following are easily established: 


1. the determinant is {multilinear} 


2. the determinant is janti-symmetric} 
3. the determinant of the identity matrix|is 1. 


These three properties uniquely characterize the determinant, and indeed can — some would 
say should — be used as the definition of the determinant operation. 


Let us prove this. We proceed by representing elements of K” as {linear combinations) of 


1 0 0 

0 1 0 
ey = 0 5 e2 = 0 r ae en = 0 5 

0 0 1 


the standard of K”. Let M be an n x n matrix. The j column is represented as 
Š; Mijei; whence using multilinearity 


det(M) = det (= Miei, iP Mizei, ... DP Mae) 


= ` Mii Mize ees Minn det (e4, Cize- €i) 


i1, in=1 
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The anti-symmetry assumption implies that the expressions det(e;,,e;,,...,¢;,) vanish if 
any two of the indices 21,...,%, coincide. If all n indices are distinct, 


det (€n, €z,- -, €i) = +det(e1,...,en), 


the sign in the above expression being determined by the the number of 
required to rearrange the list (i1,...,in) into the list (1,...,n). The sign is therefore the 


parity] of the (i1,...,%n). Since we also assume that 
det(e1,..., €n) = 1, 
we now recover the original definition (24.5.1). 
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224.6 determinants of some matrices of special form 


Suppose A is n x n|square matrix, u,v are two n-vectors, and a is a\scalar, Then 


det(A + uv) = detAt+v'adjAu, 


A u 
det & r) 


where adj A is the of A. 


adet A — vT adj A u, 


REFERENCES 


1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So- 
ciety, 1994. 
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224.7 example of Cramer’s rule 


Say we want to solve the system 
3r +2y+z— 2w = 4 
2x — y +2z— 5w = 15 
4x + 2y —-5w = 1 
3x — 2z — 4w = 1. 
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The associated matrix is 


3 2 d- =2 
2 -1 2 —5 
4 2 0 -1 
3 0 —2 —4 


whoseldeterminantlis A = —65. Since the determinant is non-zero, we can use 
To obtain the value of the k-th variable, we replace the k-th {column| of the matrix above by 


the [column vector] 
4 
15 
1> 
1 
the determinant of the obtained matrix is divided by A and the resulting value is the wanted 
solution. 
So 
4 2 12 
15 1 2 5 
1 2 0 1 
A, 1 024 —65 í 
=A 6 OB 
3 4 12 
2 16 2 5 
4 1 0 1 
Ao 3 124 130 9 
“=A 6 OH 
32 4 2 
2 1 15 5 
4 2 1 1 
A3 30 1 4 —195 
es ios = —— =3 
A —65 —65 
321 4 
2 1 2 15 
420 1 
A4 302 1 65 
E E ee 
A —65 —65 
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224.8 proof of Cramer’s rule 


Since det(A) 4 0, by of the determinant| we know that A is invertible. 


We claim that this implies that the equation Ax = b has a unique solution. Note that A~‘b 
is a solution since A(A~'b) = (AA~)b = b, so we know that a solution exists. 


Let s be an arbitrary solution to the equation, so As = b. But then s = (A7tA)s = 
A~!(As) = A~'b, so we see that A~‘b is the only solution. 


For each linteger|i, 1 < i < n, let a; denote the ith |column]of A, let e; denote the ith column 


of the identity matrix] J,,, and let X; denote the{matrix obtained from J,, by replacing column 
i with the T: 


We know that for any matrices A, B that the kth column of the product AB is simply the 
product of A and the kth column of B. Also observe that Ae, = ap for k = 1,...,n. Thus, 
by multiplication, we have: 


AX; = A(e1,... , €i—1, L, Ci+1; 228m) 
= (Aei, arar , Aei—1, AT, Aei+1, e , Aen) 
= (Gigs , Qi—1, b, Aiti; ci ayy) 
= M; 


Since X; is In with column 7 replaced with x, computing the determinant of X; with 
cofactor expansion] gives: 
det(X;) = (1) x; det(I,_1) =1-2;-1= 2; 
Thus by the multiplicative] property of the determinant, 
det(M;) = det(AX;) = det(A) det(X;) = det(A)z; 


det(M; : 
and so x; = — o as required. 
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224.9 proof of cofactor expansion 


Let M € maty(K) be a n x n-matrix with entries from alcommutativelfield| K. Let e1,... , €n 
denote the [vectors] of the canonical basis| of K”. For the proof we need the following 


Let Mj; be the matrix] generated by replacing the i-th row) of M by e;. Then 
det M% = (—1) det Mij 
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where M;; is the (n — 1) x (n — 1)-matrix obtained from M by removing its i-th row and 


j-th [column] 


Proof: By adding appropriate multiples of the i-th row of Mj, to its remaining rows we 
obtain a matrix with 1 at position (i,j) and 0 at positions (k, j) (k # i). Now we apply the 


(12) o (23) o ---o((i— 1)i) 


to rows and 
(12) o (23) o --- o (7 — 1)5) 
to columns of the matrix. The matrix now looks like this: 


e Row/column 1 are the vector e1; 


e under row 1 and right of column 1 is the matrix M;;. 
Since the |determinant! has changed its sign i + j — 2 times, we have 


Note also that computing the determinant of M;, only those permutations 7 € Sp areleffective| 
where m(i) = j. 


Now we start out with 
det M = > nes: Sgum (Ta min)) 
= aa Mik ete sgnpi Tiisa Ming) 1. ee mpi) | i 


From the previous lemma, it follows that the finner|sum associated with Mj, is the determi- 
nant of M;;. So we have 


det M = X` Mix ((-1)'** det Mix). 


k=1 
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224.10 resolvent matrix 


The resolvent matrix of almatrix] A is defined as 
Ra(s) = (sI — A)! 


Note: J is thelidentity matrix|and sis a|complex| variable. Also note that R4(s) is undefined 
on Sp(A) (the spectrum of A). 
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Chapter 225 


15A18 — Eigenvalues, singular values, 
and eigenvectors 


225.1 Jordan canonical form theorem 


Let V be alfinite-dimensionallvector spacelover alfield|F and t : V > V be allinear transformation 


Then, if the characteristic polynomial factorizes completely over F, there will exist. a [basis] 
of V with respect to which the matrix of t is of the form 


J, 0 0 
0 Jp 0 
0 0 Jp 


where each J; is a reduced (Jordan) matrix in which A = A;. 


A Jordan block or Jordan matrix is a matrix of the form 


Aà 1 0 0 
0 A 1 0 
00 A 0 
a a . 1 
0 0 0 AÀ 


with a[constant] value À along the diagonal and 1’s on the superdiagonal. Some texts place 
the 1’s on the subdiagonal instead. 
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225.2 Lagrange multiplier method 


The Lagrange multiplier method is used when one needs to find the extreme values of a 


[function] whose [domain is constrained to lie within a particular subset] of the domain. 
Method 


Suppose that f(x) and g;(x),i = 1,...,m (x € R”) are (differentiable functions] that map 


R” ++ R, and we want to solve 


min f(x) such that ¢(x)=0, i=1,...,m 


By a calculus theorem, the of f, Vf, must the following equation: 


Vf= 5 AV gi 


i=1 
Note that this is [equivalent] to solving the following problem: 


min f(x) — 3 Ai (9i(x)) 


for x, A;, without restrictions. 
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225.3 Perron-Frobenius theorem 


Let A be a nonnegative Denote its spectrum] by o(A). Then the 
p(A) is an [eigenvalue] that is, p(A) € o(A), and is associated to a nonnegative [eigenvector] 


If, in addition, A is an lirreducible matrix) then |p(A)| > |A|, for all A € o(A), A # p(A), and 
p(A) is a|simple| eigenvalue associated to a positive eigenvector. 


If, in addition, A is a{primitive matrix] then p(A) > |A| for all A € o(A), A # p(A). 
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225.4 characteristic equation 


Let A be an x n{matrix| The characteristic equation of A is defined by 


ay, — A a12 oa Qin 

a a2 —A «+> Aon 
da= a e "™ p=ð 

Ani An2 i Onn — A 


This forms an nth-degree [polynomial] in A, the solutions to which are called the [eigenvalues] 
of A. 
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225.5 eigenvalue 


Let V be a [vector space] over a [field] k, and let A be an [endomorphism] of V (meaning a 
of V into itself). A [scalar] A € k is said to be an eigenvalue of A if there is 


a nonzero x € V for which 
At =z. (225.5.1) 


Geometrically, one thinks of a{vector| whose direction is unchanged by the action of A, but 
whose magnitude is multiplied by A. 


If V is finite dimensional, elementary linear algebra) shows that there are several equivalent 


definitions of an eigenvalue: 


(2) The linear mapping 
B=XI-A 


i.e. B : x |> àx — Az, has no linversel 
(3) B is not [injective] 

(4) B is not [surjective] 

(5) det(B) = 0, i.e. det(AI — A) = 0. 


But if V is of infinite dimension, (5) has no meaning and the conditions (2), (3), and (4) are 
not equivalent to (1). A scalar satisfying (2) (called a spectral value of A) need not be 


an eigenvalue. Consider for example the [complex] vector space V of all sequences} (£n); of 
[complex numbers] with the obvious operations, and the map) A: V — V given by 


A(£1, £2, £3, si z) = (0, £1, £2, £3, 8 .) . 
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Zero is a spectral value of A, but clearly not an eigenvalue. 


Now suppose again that V is of dimension, say n. The function 
x(A) = det(B) 
is a polynomial] of idegree|n over k in the variable A, called the characteristic polynomia 


of the endomorphism A. (Note that some writers define the characteristic polynomial as 
det(A — AI) rather than det(AJ — A), but the two have the same zeros.) 


If k is C or any other algebraically closed field, or if k = R and n is odd, then y has at least 


one zero, meaning that A has at least one eigenvalue. In no case does A have more than n 
eigenvalues. 


Although we didn’t need to do so here, one can compute the coefficients of x by introducing a 
[basislof V and the corresponding|matrix|for B. Unfortunately, computing n x n\determinants| 
and finding [roots of polynomials of degree n are computationally messy procedures for even 
moderately large n, so for most practical purposes variations on this naive scheme are needed. 


See the eigenvalue problem for more information. 


If k = C but the coefficients of y are {reall (and in particular if V has a basis for which the 
matrix of A has only real entries), then the non-real eigenvalues of A appear in 
pairs. For example, if n = 2 and, for some basis, A has the matrix 


0 —1 
a=(0 0) 
then y(A\) = A? + 1, with the two zeros +i. 


Eigenvalues are of relatively little importance in connection with an 
vector space, unless that space is endowed with some additional [structure] typically that of 


a Banach space, a Hilbert space, or a normed algebra, But in those cases the notion is of 


great value in physics, engineering, and mathematics proper. Look for “spectral theory” for 
more on that subject. 


The word “eigenvalue” derives from the German “eigenwert”, meaning “proper value”. 
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225.6 eigenvalue 


Let V be a vectorial space over k and T a on V. An eigenvalue for T is an 
A (that is, an element of k) such that T(z) = Az for some nonzero lvector| z € V. Is 
that case, we also say that z is an eigenvector of T. 


This can also be expressed as follows: À is an eigenvalue for T if the of A — AI is non 
trivial. 
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A linear operator can have several eigenvalues (but no more than the /dimension| of the space). 
Eigenvectors corrsponding to different eigenvalues are 


Version: 2 Owner: drini Author(s): drini, apmxi 
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Chapter 226 


15A21 — Canonical forms, reductions, 
classification 


226.1 companion matrix 


Given a p(x) = 2" +ay_12""1++-++a,2+a9 the companion matrix of 
p(x), denoted €,,,), is the n x n matrix| with 1’s down the the first subdiagonal and minus the 
coefficients of p(x) down the last or alternatively, as the [transpose] of this matrix. 
Adopting the first convention this is simply 


0 0 —ao 

1 0 —ay 

0 1 —ag 
Cpe) = 0 0 

0 0 1 —An-1 


Regardless of which convention is used the minimal polynomial]of Cx) equals the characteristic polynomia 
of C,(2) and is just p(x). 
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226.2 eigenvalues of an involution 


Proof. For the first claim suppose A is an eigenvalue corresponding to an [eigenvector] x of 


A. That is, Ax = àx. Then Ax = \Ax, so x = Ax. As an eigenvector, x is non-zero, and 
A = +1. Now (1) follows since the [determinant] is the product of the eigenvalues. 
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For property (2), suppose that A— AI = —AA(A—1/XI), where A and are as above. Taking 
the determinant of both sides, and using part (1), and the properties of the determinant, 
yields 


1 
det(A — AI) = +)" det(A — 51). 


Property (2) follows. 
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226.3 linear involution 


Definition. Let V be a[vector space) A linear involution is a [linear operator] L -VoV 


such that L? is thelidentity operatorjon V. An definition is that a linear involution 
is a linear operator that equals to it’s own [inverse] 


Theorem 1. Let V be a vector space and let A: V — V be a linear involution. Then the 


eigenvalues|of A are +1. Further, if V is C”, and Aisanxn then we have 
that: 


1. det A = +1. 
2. The of A, p(A) = det(A — AJ), is a 


a pA) = £A"p(1/2). 


(proof.) 
The next theorem gives a correspondence between involution [operators] and op- 


erators. 


Theorem 2. Let L and P be linear operators on a vector space V, and let J be the identity 


operator on V. If L is an involution then the operators I(T £ L) are projection operators. 


Conversely, if P is a projection operator, then the operators +(2P — /) are involutions. 
The next theorem is given as exercise IV.10.14 in [2]. 
Theorem 3. Let A be a complex n x n matrix. Then any two the the below conditions 
imply the third: 

1. A is a/Hermitian matrix] 

2. A islunitary matri] 

3. The [mapping] A : C” — C” is an involution. 
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The proofs of theorems 2 and 3 are straightforward calculations. 


REFERENCES 


1. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965 
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226.4 normal matrix 


A \complex|{matrix] A € C”*™ is said to be normal if A” A = AA” where H denotes the 


conjugate transpose 
Similarly for a real matrix A € R”*™ is said to be normal if ATA = AAT where T denotes 


the [transpose 


properties: 


e Equivalently a complex matrix A € C”*™ is said to be normal if it|satisfies|[A, A”] = 0 
where [,] is the commutator bracket! 


e Equivalently a real matrix A € R”*™ is said to be normal if it satisfies [A, AT] = 0 
where [,] is the commutator bracket. 


e Let A be ajsquare|real matrix (possibly complex), it follows from that 
if A is a normal matrix then X; |A;|* = trace A*A where * is the complex conjugate 
and A; are the [eigenvalues] of A. 


e Let A bea complex square matrix, Then A is a diagonal matrix if and only if A is a nor- 
mal matrix and A is a|triangular matrix (see theorem for normal triangular matrices). 


examples: 


° (e n) where a,b € R 


see also: 
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e Wikipedia, normal matrix 
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226.5 projection 


A linear transformation P : V — V of a|vector space] V is called a projection if it acts like 
the on its|image, This condition can be more succinctly expressed by the equation 


P? =P. (226.5.1) 
Proposition 10. If P : V — V is a projection, then its image and the\kernel are complementary subspaces, 
namely 
V = ker P @img P. (226.5.2) 


S uppose that P is a projection. Let v € V be given, and set 
u = v — Pv. 


The projection condition (226.5.1) then implies that u € ker P, and we can write v as the 
sum of an image and kernel 


v= u+ Pv. 
This is unique, because the lintersection] of the image and the kernel is the 
trivial [subspace] Indeed, suppose that v € V is in both the image and the kernel of P. Then, 
Pv =v and Pv = 0, and hence v = 0. [QED] 
Conversely, every decomposition 

V=Vi0V. 


corresponds to a projection P : V — V defined by 


Specializing somewhat, suppose that the ground [fieldlis R or C and that V is equipped with 


a positive-definite In this setting we call an [endomorphism] P:V-—-V an 
orthogonal projection if it is self-dual] 


P* =P, 
in addition to satisfying the projection condition (226.5.1). 
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Proposition 11. The kernel and image of an orthogonal projection are orthogonal subspaces. 


L et u € ker P and v € img P be given. Since P is self-dual we have 
0 = (Pu, v) = (u, Pv) = (u, v). 
QED 


Thus we see that a orthogonal projection P projects a v € V Pv in an orthogonal 
fashion, i.e. 
(v — Pv, u) = 0 


for all u € img P. 
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226.6 quadratic form 


Let U be a [vector space] over a field) k, whose [characteristiclis not equal to 2. A 


Q : U > k is called a quadratic form if there exists a|symmetric bilinear form| B : U xU > k 
such that 


Q(u) = B(u,u), we. 


Thus, every symmetric bilinear form determines a quadratic form. The reverse is also true. 
Let B be a symmetric bilinear form and Q the corresponding quadratic form. A straightfor- 
ward calculation shows that 


2B(u, v) = Q(u +v) — Q(u) — Q). 


The abovelrelationlis called the polarization identity. It shows that a quadratic form Q fully 
determines the corresponding B. 


Next, suppose that U is and let u1,..., Un be a[basis| Every quadratic 
form Q is represented relative to this basis by the of 


Aij = B(u, uj), tI = sss 


where B is the corresponding bilinear form. Letting z',...,2" € k denote the 
of an arbitrary u € U relative to this basis, i.e. 


u = £tu +... +2 Un, 


we have 


Q(u) = 5 Ajjx'a’. 


ij=1 
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Writing x = (z!,...,2")? € k” for the corresponding coordinate we can write the 
above simply as 
Q(u) =x" Ax. 


In the case where k is the field of real numbers}, we say that a quadratic form is|positive definite| 
or if the same can be said of the corresponding bilinear 


form. 
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Chapter 227 


15A23 — Factorization of matrices 


227.1 QR decomposition 


triangularization (QR decomposition) reduces a mxn A 
with m > n and full rank to a much simpler form. It guarantees numerical stability by 
minimizing errors caused by machine roundoffs. A suitably chosen orthogonal matrix Q will 
triangularize the given matrix: 


self 


with the n x n upper triangular matrix) R. One only has then to solve the triangular system 
Rx = Pb, where P consists of the first n rows of Q. 


The Ax 7 bis easy to solve with A = QR and Q an orthogonal matrix. 


The solution 
a = (ATA) 'ATb 
becomes 
r= (R7Q™QR)'R*Q"b — (RTR)! RTQTb = RQ7b 


This is a matrix-vector multiplication Q7b, followed by the solution of the triangular system 
Rx = QTb by back-substitution. The QR factorization saves us the formation of ATA and 


the solution of the normal equations| 
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Many different methods exist for the QR decomposition, e.g. the Householder transformation, 
the or the 


References 


e Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) 
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Chapter 228 


15A30 — Algebraic systems of matrices 


228.1 ideals in matrix algebras 


Let R be a [ring] with 1. Consider the ring Mpxn( R) of n x n-matrices with entries taken 
from R. 


It will be shown that there exists alone-to-onel correspondence between the lidealslof R and 
the ideals of Mnyn(R). 


For 1 < i,j < n, let Fij denote the n x n-matrix having entry 1 at position (i, j) and 0 in 
all other places. It can be easily checked that 


0 iff kj 


Ey: Eu = ( Ej, otherwise. ael 


Let m be an ideal in Mnxn(PR). 
Claim 1. The seti C R given by 
i= {x E€ R | xis an entry of A| A € m} 
is an ideal in R, and m = Mnxn li). 
W e have i # 0 since 0 € i. Now let A = (a;; and B = (bi; belmatriceslin m, and z, y € R be 
entries of A and B respectively- say x = a;; and y = bkl. Then the matrix A- Ej + Eik: B Em 


has x + y at position (i,l), and it follows: If x,y € i, then x+y € i. Since i is an ideal in 
Mnxn(R) it contains, in particular, the matrices D, - A and A- D,, where 


D, = Sore 


i=1 


1023 


thus, rz, xr € i. This shows that i is an ideal in R. Furthermore, Mpxn(i) C m. 


By construction, any matrix A € m has entries in i, so we have 


A = ` Qij Pij, Qij E i 


1<i,j<n 
so A € Myxn(i). Therefore m C Mnxnli. 
A consequence of this is: If F is afield] then Mnxn( F) is [simple] 
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Chapter 229 


15A36 — Matrices of integers 


229.1 permutation matrix 


A permutation matrix is any [matrix] which can be created by rearranging the rows] and/or 


columns) of an identity matrix 


Pre-multiplying a matrix A by a permutation matrix P results in a rearrangement of the 
rows of of A. Post-multiplying by P results in a rerrangement of the columns of A. 


Let A be ann x n matrix. If the matrix P is obtained by swapping rows 2 and 7 of then xn 
identity matrix [,,, then rows 7 and j of A will be swapped in the product PA, and columns 
i and j of A will be swapped in the product AP. 
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Chapter 230 


15A39 — Linear inequalities 


230.1 Farkas lemma 


Given an m x n matrix, A and an n x 1 real column vector|c, both with real coefficients, one 


and only one of the following systems has a solution: 


1. Ax < 0 and cx > 0 for some n-column {vector! z; 


2. wA = c and w > 0 for some n-row vector w. 
Equivalently, one and only one of the following has a solution: 


1. Ax <0, x < 0 and cz > 0 for some n-column vector z; 


2. wA < c and w È 0 for some n-row vector w. 


Remark. Here, Ax > 0 means that every component of Ax is nonnegative, and similarly 
with the other expressions. 
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Chapter 231 


15A42 — Inequalities involving 
eigenvalues and eigenvectors 


231.1 Gershgorin’s circle theorem 


Let A be a Around every element a;; on the diagonal of the matrix, 
we draw a with the sum of the morms| of the other elements on the same 


J jsi laij]. Such circles are called Gershgorin discs. 
Theorem: Every eigenvalue] of A lies in one of these Gershgorin discs. 


Proof: Let À be an eigenvalue of A and x its corresponding |eigenvector, Choose i such that 
|v;] = max,|a,|. Since x can’t be 0, |x;| > 0. Now Ax = Az, or looking at the i-th{component| 


(à = ii) Li = X ajz. 


jżi 


Taking the norm on both sides gives 


Daba 
|A — aul = | < 5 lgl 


jx `? j#i 
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231.2 Gershgorin’s circle theorem result 


Since the eigenvalues) of A and A are the same, you can get an additional set of 
ldiscs| which has the same [centers] a;;, but a{radius| calculated by the [column] Djs la;;|. In 
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each of these [circles| there must be an eigenvalue. Hence, by comparing the row|and column 
discs, the eigenvalues may be located efficiently 


Version: 3 Owner: saki Author(s): saki 


231.3 Shur’s inequality 


Theorem (Shur’s inequality) Let A be a n x n{matrix! with [reall (or possibly [complex] 
entries). If A1,..., An are the eigenvalues] of A, and D is the diagonal matrix D = diag(Ay,..., An), 
then 

traceD*D < trace A*A. 


Here trace is the and * is the complex conjugate, Equality holds if and 
only if A is anormal matrix| ([I], pp. 146). 


REFERENCES 
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So- 


ciety, 1994. 
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Chapter 232 


15A48 — Positive matrices and their 
generalizations; cones of matrices 


232.1 negative definite 


Let A be an n x nsymmetric|real |square matrix] If for any [non-zero vector! we have 
cA <0, 
we call A a negative definite matrix! 
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232.2 negative semidefinite 


Let A be ann x n|symmetric|real||square matrixi If for any non-zero [vector] x we have 
xv Ar <0, 
we call A a negative semidefinite matrix! 
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232.3 positive definite 


Introduction 


The definiteness of a matrix] is an important that has use in many areas of mathe- 
matics and even physics. Below are some examples: 


1. In optimizing problems, the definiteness of the [Hessian matrix] determines the quality 
of an extremal value. The full details can be found on this page 


2. In electromagnetism, one can show that the definiteness of a certain media matrix de- 
termines the qualitative property of the media: if the matrix is positive or negative definite, 


the media is active respectively lossy [I]. 


Definition Suppose A isann xn Hermitian matrix, If, for any non-zero vector 
2: 


x, we have that 
x* Ax > 0, 


then A a positive definite matrix. (Here x* = zT, where 7 is the complex conjugate) of x, 
and x" is the [transpose] of x.) 


One can show that a Hermitian matrix is positive definite if and only if all it’s eigenvalues) 
are positive [2]. Thus the [determinant] of a positive definite matrix is positive, and a posi- 


tive definite matrix is always invertible. The Cholesky decomposition) provides an economic 
method for solving involving a positive definite matrix. Further conditions 


and properties for positive definite matrices are given in [8]. 


REFERENCES 


1. I.V. Lindell, Methods for Electromagnetic Field Analysis, Clarendon Press, 1992. 

2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965 

3. C.R. Johnson, Positive definite matrices, American Mathematical Monthly, Vol. 77, Issue 
3 (March 1970) 259-264. 
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232.4 positive semidefinite 


Let A be an n x n[symmetric|reallsguare matriz] If for any on-Zero vector z we have 


x Ax > 0, 


1030 


we call A a positive semidefinite [matrix] 
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232.5 primitive matrix 


A nonnegative |square matrix] A is said to be a primitive matrix if there exists k such that 
A*® > 0, i.e., if there exists k such that for all i,j, the (i, j) entry of A* is ak. = 0, 


An equivalent) condition for a matrix to be a primitive matrix is for the matrix to be an 
irreducible matrix) with positive ‘trace 
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232.6 reducible matrix 


A nonnegative n x n{matrix| A is a reducible matrix if there exists a!permutation matrix] P 
such that 

Ay, AÁ 
A is an irreducible matrix if it is not a reducible matrix. Two conditions for a 


matrix to be an irreducible matrix are: 


PTAP= | 


1. the {digraph associated to A is strongly connected 


2. for each i, j, there exists k such that the (i,j) entry in A* is af, > 0. 
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Chapter 233 


15A51 — Stochastic matrices 


233.1 Birkoff-von Neumann theorem 


An nxn\matrixiis|doubly stochasticlif and only if it is alconvex combination of\permutation matrices 
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233.2 proof of Birkoff-von Neumann theorem 


First, we prove the following lemma] 


Lemma 4. A of doubly stochastic\matrices| is doubly stochastic. 


L et {A;}™, be a collection] of n x n doubly-stochastic matrices, and suppose {\;}™, is a 
collection of |scalars] satisfying yy A; = 1 and A; > 0 for each i = 1,...,m. We claim that 
A= yo" àA; is doubly stochastic. 

Take any i € {1,...,m}. Since A; is doubly stochastic, each of its rows) and [columns] sum 
to 1. Thus each of the rows and columns of A;A; sum to A;. 

By the definition of elementwise summation, given matrices N = Mı + Mə, the sum of the 
entries in the ith column of N is [clearly] the sum of the sums of entries of the ith columns of 
M; and Mg respectively. A similar result holds for the jth row. 

Hence the sum of the entries in the ith column of A is the sum of the sums of entries of the 
ith columns of A;,A, for each i, that is, Xz- Ax = 1. The sum of entries of the jth row of 
A is the same. Hence A is doubly stochastic. 


Observe that since a permutation matrix has a single nonzero entry, equal to 1, in each 


row and column, so the sum of entries in any row or column must be 1. So a permutation 
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matrix is doubly stochastic, and on applying the lemma, we see that a convex combination 
of permutation matrices is doubly stochastic. 


This provides one direction of our proof; we now prove the more difficult direction: sup- 
pose B is doubly-stochastic. Define a weight G = (V,E) with set V = 
{r1,.-+)Tn;C1,--- Cn}, edge ledge] set E, where e; = (rci) € E if By # 0, and edge weight 
w, where w(e;;) = Bij. 


Clearly G is a bipartite graph, with R= {rū1,..., fn} and © = {c1,..., Cn}, since 
the only edges in E are between r; and c; for some i,j € {1,...,n}. Furthermore, since 
Bj; > 0, then w(e) > 0 for every e € E. 


For any A C V define N(A), the neighbourhood] of A, to be the set of [vertices] u € V such 
that there is some v € A such that (u,v) € E. 


We claim that, for any v € V, uenda UUs v) = 1. Take any v € V; either v € R or 
v € C. Since G is bipartite, v € R implies N({v}) C C, and v € C implies N({v}) C R. 


Now, 
n 


=p Sow w(ri, u = ul w(ri, Cj) - È B= D 
j=1 


uEN (ri) j=1, 
eijEE Ba#o 
n n 
= ĉj > > (u, cj) =J u w(ri, Cj) -5 Bij = > By =1 
ue N(c;) i=1, poe 
eijEE Bij40 


since B is doubly stochastic. Now, take any A C R. We have 


y wlv,w)=S> 3 owas I=] 
weN(A) vEA wEN({v}) veEA 
Let B = N(A). But then clearly A C N(B), by definition of neighbourhood. So 
IN(A)|=|Bl= D> wv,w) > dlu(v,w) = $ oww) = ]Al 


vEB vEB wEA 
wEN(B) wEA vEN(A) 


So |N(A)| > |A|. We may therefore apply the graph-theoretic version of Hall ’s marriage theorem 


to G to conclude that G has a 


So let M C E be a perfect matching for G. Define an n x n matrix P by 


P. = 1 if eij E M 
‘I~ | 0 otherwise 


Note that B;; = 0 implies P; = 0: if B, = 0, then (r;, cj) ¢ E, so (ri,c;) ¢ M, which 
implies P;; = 0. 


Further, we claim that P is a permutation matrix: 
Let i be any row of P. Since M is a perfect matching of G, there exists eo E€ M such that 


1033 


r; is an end of eo. Let the other end be c; for some 7; then P;; = 1. 

Suppose t; i2 E€ {1,...,n} with i; Æ i2 and Paj = Pij = 1 for some j. This implies 
(ri, c), (Tin, ¢;) E M, but this implies the vertex 7 is the end of two distinct edges, which 
contradicts the fact that M is a matching. 

Hence, for each row and column of P, there is exactly one nonzero entry, whose value is 1. 
So P is a permutation matrix. 


DefineX = min 1 Bis | Pi; # 0}. We see that A > 0 since B,; > 0, and P,; 40 > Bi; £ 0. 
4 JEi l, -n 
Further, \ = By, for some p,q. 


Let D = B — AP. If D = 0, then à = 1 and B is a permutation matrix, so we are done. 
Otherwise, note that D is nonnegative; this is clear since AP;; < A < Bj; for any By; # 0. 


Notice that D,, = Bpa — APp =A — As 1=0. 


Note that since every row and column of B and P sums to 1, that every row and column of 
D = B — AP sums to 1 — à. Define B’ = zD. Then every row and column of B’ sums to 
1, so B’ is doubly stochastic. 


Rearranging, we have B = AP + (1 — A)B’. Clearly B;; = 0 implies that P;; = 0 which 
implies that B;; = 0, so the zero entries of B’ are a|superset| of those of B’. But notice that 
Bi = Tx? = 9, so the zero entries of B’ are a strict superset of those of B. 

We have decomposed B into a convex combination of a permutation matrix and another 
doubly stochastic matrix with strictly more zero entries than B. Thus we may apply this 
procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and 
the number of zero entries will increase with each step. Since B has at most n? nonzero 
entries, we will obtain a convex combination of permutation matrices in at most n? steps. 


Thus B is indeed expressible as a convex combination of permutation matrices. 
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Chapter 234 


15A57 — Other types of matrices 
(Hermitian, skew-Hermitian, etc.) 


234.1 Hermitian matrix 


A [matrix] A is said to be Hermitian or self-adjoint if 


A= AT = A’ 


where A’ is the [transpose] and A is the [complex conjugate 


Note that a Hermitian matrix must have [reall diagonal elements, as the complex conjugate 
of these elements must be equal to themselves. 


Any real is Hermitian; the real symmetric matrices are a subset] of the 


Hermitian matrices. 


An example of a Hermitian matrix is 


1 Tt 1+2% 1+3 
1—% 2 2+2 2-31 
baa 2—2 3 3+ 32 
1—32 2—3 3-3 4 


Hermitian matrices are named after Charles Hermite (1822-1901) [2], who proved in 1855 


that the [eigenvalues] of these matrices are always real [i]. 
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234.2 direct sum of Hermitian and skew-Hermitian ma- 
trices 


In this example, we show that any with entries can uniquely be 
decomposed into the sum of one/Hermitian matrix] and one|skew-Hermitian matrix) A fancy 
way to say this is that complex square matrices is the direct sum of Hermitian and skew- 
Hermitian matrices. 


Let us denote the [vector space] (over C) of complex square) n x n|matrices| by M. Further, 
we denote by M}, respectively M_ the of Hermitian and skew-Hermitian 


matrices. We claim that 
M = MoM.. (234.2.1) 


Since M, and M_ are vector subspaces of M, it is clear that M, + M_ is a vector subspace 
of M. Conversely, suppose A € M. We can then define 
1 * 
A, = (A -H A ), 


A = =(A-A’). 


Dole pl 


Here A* = AT, and A is the {complex conjugate) of A, and AT is the of A. It 


follows that A, is Hermitian and A_ is anti-Hermitian. Since A = A, + A_, any element in 
M can be written as the sum of one element in M, and one element in M_. Let us check 


that this [decomposition] is unique. If A € M,()M_, then A = A* = —A, so A= 0. We 
have established equation 2342.1] 


Special cases 


e In the special case of 1 x 1 matrices, we obtain the decomposition of a 
into it’s reall and 


e In the special case of real matrices, we obtain the decomposition of a n x n matrix into 


a and matrix, 
Version: 1 Owner: mathcam Author(s): matte 
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234.3 identity matrix 


The n x n identity matrix J (or J,,) over alring| R is the square matrix) with coefficients in 
R given by 


O O © m| 
oOo e © 
AGG. O5 


, 


where the numeral ”1” and ”0” respectivelylrepresent|the[multiplicativeļandladditivelidentities 
in R. The identity matrix J,, serves as the identity in the ring of n x n [matriceslover R. For 

any n x n matrix M, we have I,M = MI, = M, and the identity matrix is uniquely defined 

by this property| In addition, for any n x m matrix A and m x n B, we have JA = A and 

BI = B. 


Properties 
The n x n identity matrix I the following properties 


e For the [determinant] we have det J = 1, and for the trace, we have tr I = n. 


e The identity matrix has only one eigenvalue] A = 1 of multiplicity n. The corresponding 
eigenvectors] can be chosen to be vı = (1,0,...,0),...,Un = (0,...,0, 1). 


e The of I gives e! = el. 
e The identity matrix is a{diagonal matrix 
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234.4 skew-Hermitian matrix 


Definition. A [square matrix| A with complex] entries is skew-Hermitian, if 
A=-A*. 


Here A* = AT, AT is the transpose) of A, and A is is the [complex conjugate] of the [matrix] 
A. 
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1. The trace] of a skew-Hermitian matrix is purely Imaginary or zero. 
2. The eigenvalues) of a skew-Hermitian matrix are purely imaginary or zero [I]. 


Proof. For property (1), let x;; and yi; be the respectively imaginary parts of the 
elements in A. Then the diagonal elements of A are of the form £k + iYgk, and the diagonal 
elements in —A* are of the form —£kk + iygk. Hence Tkg, i.e., the real part for the diagonal 
elements in A must vanish, and property (1) follows. For property (2), suppose A is a 
skew-Hermitian matrix, and x an [eigenvector] corresponding to the eigenvalue A, i.e., 


Ar = Az. (234.4.1) 
Here, x is a complex [column vector, Multiplying both sides by z* = zT yields 
g Ax = An" 2: 


Since x is a eigenvector, x is not the and the right hand side in the above 
equation is a positive real number. For column vectors x, y, and a square matrix A, we have 
that (Az)? = (xTAT)T and Ty = yz. Thus 


gAr = «At 
= a (¢* A") T 
= (7 A" )\e 
= =g“ Ar, 


so the left hand side in equation 234.4. I] is purely imaginary or zero. Hence the eigenvalue 
A corresponding to x is purely imaginary or zero. 


REFERENCES 
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons, 
1978. 
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234.5 transpose 


The transpose of almatrix) A is the matrix formed by ” flipping” A about the diagonal line 
from the upper left corner. It is usually denoted A‘, although sometimes it is written as AT 
or A’. So if A is an m x n matrix and 


1038 


Qi 412 *** Gin 


Q21 Q22 *** Gan 
A= 
Ami Am2 +! Amn 
then 
Qil Q213 >>? Ami 
pom Q12 Q22 ‘`’ Ama 
Ain Gan **° Anm 


Note that the transpose of an m x n matrix is a n x m matrix. 


Let A and B be m x n matrices and c be a\constant) Let x and y be[column vectors! with n 
rows) Then 


1. (A =A 

2. (A+ BY = A + Dt 
3. (cA) = cA 

4, (AB) = BtAt 


5. If A is invertible, then (A‘)~' = (A7t) 
6. trace( AtA) > 0 (where trace is the |trace of a matrix). 


7. The transpose is a from the [vector space] of matrices to itself. That is, 
(aA + GB)! = a(A)! + (BÙ, for same-sized matrices A and B and|scalars|a and £. 


The familiar can also be defined using the matrix transpose. If x and y 


are column vectors with n rows each, 

ey = oy 
which implies 
2 


xz =x- x = ||| 
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which is another way of defining the {square of the vector Euclidian mormi 
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Chapter 235 


15A60 — Norms of matrices, 
numerical range, applications of 
functional analysis to matrix theory 


235.1 Frobenius matrix norm 


Let R be an ordered ring) with alvaluation||-| and let M(R) denote the set of [matrices] over 
R. The Frobenius or Euclidean matrix norm is the norm function 
||Allz : M(R) > R given by 


m n 
Dd, dla? 


i=1 j=1 
A more concise (though equivalent) definition is 
|| A|| = ytrace(AA*), 


where A* denotes the conjugate transpose of A. 
Denote the {columns| of A by A;. A nice of the norm is that 


|| Alle = 


All = [All + [Aalla + +++ + [Anll2- 


(sce racal transpose) 


Version: 8 Owner: mathcam Author(s): mathcam, Logan 


1041 


235.2 matrix p-norm 


A [class] of matrix norms) denoted || - ||p, is defined as 


A 
| Allp=sup Ate per Aer", 


#0 |z ll 


The p-norms are defined in of the p-norms. 


An alternate definition is 
|| A |p = max ||Az||p- 
|| æ ||p=1 


As with vector p-norms, the most important are the 1, 2, and œo mormsi The 1 and oo norms 
are very easy to lcalculatel for an arbitrary matrix: 


m 
|All = max > layl 

i=1 

m 
lAl% = Bep lais|. 

I= 


It directly follows from this that || A ||; = || A? Ilo- 


The calculation of the 2-norm is more complicated. However, it can be shown that the 
2-norm of A is the [square root| of the largest [eigenvalue] of A’ A. There are also various 
‘inequalities| that allow one to make estimates on the value of || A ||: 


1 
val Aloo < || A ll2 < vnl A lla- 


1 


alah < || Alle < vall Afi. 


Alle < All < Vn A lle. 


(|| A ||r is the[Frobenius matrix norm) 
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235.3 self consistent matrix norm 


A {matrix norm] N is said to be self [consistent] if 
N(AB) < N(A)- N(B) 
for all pairs of fmatrices| A and B such that AB is defined. 
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Chapter 236 


15A63 — Quadratic and bilinear 
forms, inner products 


236.1 Cauchy-Schwarz inequality 


Let V be alvector space) where an <, > has been defined. Such spaces can be 
given also a mormi by defining 
[|| =< aa >. 


Then in such space the Cauchy-Schwarz inequality holds: 
| <v, w> |< felllle 
for any v,w € V. That is, the modulus) (since it might as well be alcomplex number) of the 


inner product for two given |vectorslis less or equal than the product of their norms. Equality 
happens if and only if the two vectors are linearly dependent. 


A very special case is when V = R” and the inner product is the defined as 
<v,w >= vw and usually denoted as v- w and the resulting norm is the 


If v = (v1, v2,..., Un) and w = (w1, W2, ..., Wn) the Cauchy-Schwarz inequality becomes 


u: v = [vw + vzw + +++ + Unwn] S yvi tugs bony wi tw + w = ulloll. 


Notice that in this case inequality holds even if the modulus on the middle [term] (which is a 
is not used. 


Cauchy-Schwarz inequality is also a special case of Holder inequality, The inequality arises 


in lot of so it is known under several othernames as Buniakovsky inequality or Kan- 
torovich inequality. Another form that arises often is Cauchy-Schwartz inequality but this 


is a mispelling since the inequality is named after [Hermann Amandus Schwarz] (1843-1921). 
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236.2 adjoint endomorphism 


Definition (the bilinear case). Let U be a /finite-dimensional) vector space] over a 
K, and B : Ü x U > K a pymmettig fnou-degeneratelblinearlimapping for example a feal 


For an [endomorphism] T : U — U we define the adjoint of T relative to B to 
be the endomorphism T™ : U — U, characterized by 


B(u, Tv) = B(T*u,v), u,v EU. 


It is convinient to identify B with a linear [isomorphism] B : U — UV in the sense that 
B(u,v) =(Bu)(v), u,v EU. 
We then have 
c= Be, 


To put it another way, B gives an isomorphism between U and the dual U*, and the adjoint 
T* is the endomorphism of U that corresponds to the [dua] homomorphism] T vV: UV => UV. 
Here is a to illustrate this idea: 


Relation to the matrix transpose. Let u,,...,u, be a of U, and let M € 
Matn n(K) be the[matrixlof T relative to this basis, i.e. 


2 M uj = T(u;). 


J 


Let P € Mat,,,(IK) denote the matrix of the inner product relative to the same basis, i.e. 


Then, the representing matrix of T* relative to the same basis is given by P~!MT™P. Spe- 
cializing further, suppose that the basis in question is [orthonormal] i.e. that 


Then, the matrix of T* is simply the [transpose] M”. 
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The Hermitian (sesqui-linear) case. IfT : U — U isan endomorphism of ajunitary space 
(a [complex] vector space equipped with a Hermitian inner product). In this setting we can 


define we define the Hermitian adjoint T*% : U — U by means of the familiar adjointness 
condition 
(u, Tv) = (T*u,v), u,v eu. 


However, the analogous operation at the matrix level] is the {conjugate transpose, Thus, if 
M € Matn n(C) is the matrix of T relative to an [orthonormal basis, then MT is the matrix 


of T* relative to the same basis. 
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236.3 anti-symmetric 


A [relation R on A is antisymmetric iff Vx, y € A, («Ry A yRx) — (x = y). The number 


of possible antisymmetric [relations| on A is 2"37 
where n = |A]. 


=~ out of the 2” total possible relations, 


Antisymmetric is not the same thing as ” not as it is possible to have both at 
the same time. However, a relation R that is both antisymmetric and symmetric has the 
condition that Ry => x = y. There are only 2” such possible relations on A. 


An example of an antisymmetric|relation onl A = {0, x, x} would be R = {(x, x), (x, 0), (0, *), (x, x) }. 
One relation that isn’t antisymmetric is R = {(x, ©), (x, o), (0, *)} because we have both «Ro 
and oRx, but o # x 
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236.4 bilinear map 


Definition. Let U and V be [vector spaces] over alfield| K. A lfunction| B : U x V > K is 


called a bilinear map if 


1. B(czı + £2, y) = cB(x1,y) + B(z, y) ’ ce kK 


2. B(x, cy + y2) = cB(x, y1) + B(x, y2) , cEK 


That is, B is bilinear if it is linear in each parameter, taken separately. 
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Bilinear forms. If U = V then B is a bilinear form. In this case further assumptions 
are often made: 


1. B(x,y) = Bly, x), x,y € V (symmetric) 
2. B(a,y) = -By.2), 2.9 € V 
3. B(x,x) =0, x € V (alternating) 


By expanding B(x +y, x+y) = 0, we can show alternating implies skew-symmetric. Further 
if K is not of 2, then skew-symmetric implies alternating. 


Left and Right Maps. We may regard the bilinear map as a left map or right map, as 
follows: 


Bre LUV I Brel) 
Br(x)(y) = B(z,y) 
Br(y)(x) = B(x, y) 


The left is a [linear map| from U into the dual of V. So for example B is skew-symmetric 
Br = —Bpr. 


Matrix Representation. Suppose U and V are and we have chosen 
lbases| Bı = {e1,...} and By = {f1,...}. Now we define the [matrix] C with entries Cy = 
B(e;, f;). This will be the matrix associated to B with respect to this as follows; If 
we write x,y € V as [column vectors] in [terms] of the chosen bases, then B(x, y) = «7 Cy. 
Further if we choose the corresponding dual bases for U* and V* then C and C7? are the 
corresponding matrices for Br and Bz, respectively (in the sense of linear maps). Thus we 


see that a symmetric bilinear form is represented by a symmetric matrix, and similarly for 


skew-symmetric forms. 


Let Bi and Bj be new bases, and P and Q the corresponding change of basis matrices. Then 
the new matrix is C” = P™CQ. 


Rank. If U and V are {finite dimensional] it may be shown that rank B; = rank Br. We 
call this simply the|rank) of B. We say that B is non-degenerate] if the left and right map 
are both linear [isomorphisms] 


Now applying the on both left and right maps gives the following results: 


dim U = dim ker B; +r 
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dim V = dim ker BR +r 


Orthogonals. If T C U and S C V then we may define the [orthogonals| T+ C V and 
+S CU as follows: 


T+ = {v | Bit,v) =0VtEe T} 
+S = {u | B(u, s) = 0 Ys € S} 


The orthogonal of a\subspace jis itself a subspace. Further if B is a symmetric or|skew-symmetric bilinear for 


then +A = A+, and we may choose the latter notation. 
T is a non-degenerate subspace if TQ T+ = {0}. Similarly when SAN +S = {0}. 


We may also realise T+ by considering the restriction B’ = Brxy. It’s clear that T+ = 
ker B}. Now if B is non-degenerate (or more generally T N+V = {0}) then ker BY, = {0}, 
and we can use the rank-nullity equations to get dim V = dim T +dim TŁ. Similarly we may 
show that dim U = dim S + dim* S. 


Canonical Representations for Symmetric Forms. If B: V xV — K isa symmetric 
bilinear form over a finite-dimensional vector space, then there is an orthogonal basis such 
that B is represented by 


ay 0 0 
0 a2 0 
0 0 ... an 
Denote the rank of B by r. 
If K = R we may choose a basis such that ay = ...a; = 1, Q41 ... = tein = —1 and 


at4p+j = 0, for some lintegers|p and t. Further these integers are invariants] of the bilinear 


form. This is known as Sylvester’s Law of Inertia. B is positive definite iff t = n, 
p =0. Such a form constitutes a|real|linner product space 


If K = C we may go further and choose a basis such that aj =... = a, = 1 and a,4; = 0. 


If K = F, we may choose a basis such that aj = ... = ar_1 = 1, ay; = 0 and a, = n or 
a, = 1; where n is the least positive quadratic non-residue. 
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Adjoint. Suppose B : U x V — K is a non-degenerate bilinear map. If T € L(U,U) then 
we define the ladjoint| of T, T* € L(V,V) to be the unique linear map such that: 


B(Tu,v) = B(u, T*v) 


Let T* : U* — U* be the dual endormorphism. Then T* = Bp~' o T* o Bp. 


If U = V and we choose a canonical (orthogonal) basis for B, then the adjoint corresponds 


to the matrix [transpose] 
T is then said to be a (with respect to this bilinear map) if it commutes 


with its adjoint. 


Examples. An important example is the non-degenerate bilinear map 


B:V xV* > K 
Biv, f) = flv) 


Here the orthogonal is exactly theannihilator, This gives the result that dim U + dim U° = 
dim V. 


An n x m matrix may be regarded as a bilinear form over K” and K”. Two matrices, B 
and C, are then said to be {congruent if there exists an invertible P such that B = PTCP. 
Note this is different to the usual notion of [congruence] 


If the matrix is the identity, J, then this gives the standard Euclidean. on K”. 


An inner product space on a vector space is a bilinear form if its field is real, but not 
if it is In fact, the bilinear form associated with a real inner product space is 
non-degenerate, and every subspace is non-degenerate. So, as we may intuitively expect, 
V =U U+ for every subspace U. 
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236.5 dot product 


A 


Let u = (u1, Ug,...,Un) and v = (v1, v2,..., Un) two lvectors|on k” where k is a [field] (like 
or C). Then we define the dot product of the two vectors as: 


UV = UVI + UWV F e + Unn. 
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Notice that u -v is NOT a vector but an bcalar] (an element from the field k). 


If u,v are vectors in R” and 0 is the angle between them, then we also have 


u-v = |lul|||v|| cosé. 
Thus, in this case, u L v if and only if u -v = 0. 
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236.6 every orthonormal set is linearly independent 


Theorem 
Let S be a set of [vectors] from an inner product space) L. If S is then S is 


Proof. We denote by (-, -) the inner product] of L. Let us first consider the case when © is 
i.e., S = {e1,...,€n} for some n. Suppose 


Apey e+: + Anen = 0 


for some [scalars] A; (belonging to the [field] on the underlying [vector space| of L). For a fixed! 


k in 1,...,n, we then have 


0 = (e,0) 

= (ek, A1€1 +--+ + Ann) 
Alek, €1) +e + Anlek, en) 
Ak, 


so Àx = 0, and S is linearly independent. Next, suppose S is infinite) (countable or uncountable). 
To prove that S is linearly independent, we need to show that all finite subsets) of are lin- 
early independent. Since any subset of an orthonormal set is also orthonormal, the infinite 
case follows from the finite case. 


The present result with proof can be found in [I], page 153. 


REFERENCES 
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons, 
1978. 
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236.7 inner product 


An inner product on a [vector space| V over a K (which must be either the field R of 
‘real numbers) or the field C of {complex numbers) is alfunction|(, ) : V x V — K such that, 
for all kı, k2 E€ K and v1, v2,v,w E€ V, the following [properties] hold: 


1. (kyvy + kove, w) = kı (v1, w) + ko(vo, w) (linearity E) 


2. (v,w) = (w, v), where — denotes complex conjugation (conjugate symmetry) 
3. (v, v) 2 0, and (v, v) = 0 if and only if v = 0 (positive definite) 


(Note: Rule 2 guarantees that (v, v) € R, so the (v, v) > 0 in rule 3 makes sense 
even when K = C.) 


The standard example of an inner product is the dot product on K”: 


(x1, a sta) (yı, . sy Yn)) = Soom 


Every is a{normed vector space} with the morm being defined by ||v|| := 


(v, v). 
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236.8 inner product space 


A[vector spaceļover R or C taken with a specific ((x, y)) forms an inner product 


space. 


For example, R” with the familiar dot product forms an inner product space. 


The expression \/(x, x) is written ||a|| and is called the morm This makes the inner product 


space also a normed vector space, That is, the inner product space also has the following 


1. |lex|| = |el- lal] ,c€ K. 
2. ||| = 0 if and only if x = 0, ||z|| > 0. 
3. æ + y|| < |x|] + ly], the triangle inequality 


1 A small minority of authors impose linearity on the second [coordinate instead of the first coordinate. 
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where K is the underlying [field] of the vector space. 


In addition, the Cauchy-Schwarz inequality 
| (x,y) | < Tle] Mall 


holds and follows from the definition of an inner product space. 
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236.9 proof of Cauchy-Schwarz inequality 


If a and b are linearly dependent, we write b = Aa. So we get: 
(a, àa)? = X° (a, a)? = *\/al|* = ||a)|?|)/°. 


So we have equality if a and b are linearly dependent. In the other case we look at the 


quadratic function! 
e- a + BI? = 2°|lal? + 20(a, b) + Ibl. 


This function is positive for every x, if a and b are linearly independent, Thus it has 


no real zeroes, which means that 
(a, b}? — ||al|? [bl]? 
is always negative. So we have: 
(a, b)? < ||al]l’llb]]?, 
which is the Cauchy-Schwarz inequality if a and b are linearly independent. 
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236.10 self-dual 


Definition. Let U be a[finite-dimensionallinner-product space over alfieldlK. Let T : U > 
U be an and note that the [adjoint endomorphism) T* is also an endomor- 
phism of U. It is therefore possible to add, subtract, and compare T and T*, and we are 
able to make the following definitions. An endomorphism T is said to be self-dual (a.k.a. 
self-adjoint) if 


Ter. 
By contrast, we say that the endomorphism is anti self-dual if 
T = —T*. 


Exactly the same definitions can be made for an endomorphism of a \complex|/vector space] 
with a Hermitian inner product 
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Relation to the matrix transpose. All of these definitions have their counterparts in 
the matrix! setting. Let M € Mat, n(K) be the matrix of T relative to an 
of U. Then T is self-dual if and only if M is a|\symmetric matrix} and anti self-dual if and 
only if M is a Bkewaymmete matt 


In the case of a Hermitian inner product we must replace the|transpose| with the|conjugate transpose 


Thus T is self dual if and only if M is a Hermitian matrix, i.e. 


M = MÈ. 


It is anti self-dual if and only if 
M = —Mt. 
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236.11 skew-symmetric bilinear form 


A skew-symmetric (or antisymmetric) is a bilinear form B which is skew- 
symmetric in the two |coordinates} that is, B(x,y) = —B(y,x) for all [vectors] x and y. In 
particular, this means that B(x, x) = 0. 


A bilinear form is skew-symmetric iff its defining [matrix is skew-symmetric. 
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236.12 spectral theorem 


Let U be a[finitedumensionall patay space|and let M: U — U be anfaadomorphism) We 
say that M is normal] if it commutes with its Hermitian adjoint} i.e. 


MM* = M*M. 


Spectral Theorem Let M:U — U be allinear transformation of a unitary space. [TFAF} 


1. The|transformation| M is normal. 


2. Letting 
A={A€C| M — Ais singular} 


denote the [spectrum] (set of of M, the corresponding eigenspaces 
Ey =ker(M—), AEA 
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give an orthogonal, direct_sum decomposition of U, i.e. 
U= E 
ACA 


and Ex, L E, for distinct eigenvalues Ay Æ A2. 


3. We can decompose M as the sum 


M=} AP, 


ACA 


where A € C is a [finitellsubset] of indexing a family of commuting 
orthogonal projections) P, : U — U, i.e. 
Py A= H 


P* =P P,P, = 
i i an t AFL, 


and where WLOG] 
` Py = ly. 


ACA 


4. There exists an [orthonormal basis of U that diagonalizes M. 


Remarks. 


1. Here are some important [classes] of normal operators, distinguished by the nature of 
their eigenvalues. 


e Hermitian operators, Eigenvalues are 


e unitary transformations) Eigenvalues lie on thelunit circle) i.e. the set of complex 
numbers of 1. 


e Orthogonal projections. Eigenvalues are either 0 or 1. 


2. There is a well-known version of the spectral theorem for R, namely that a [self-adjoint] 
transformation of a real can diagonalized and that 
eigenvectors corresponding to different eigenvalues are orthogonal. An even more down- 
to-earth version of this theorem says that a symmetric, real can always be 
diagonalized by an orthonormal basis of eigenvectors. 


3. There are several versions of increasing sophistication of the spectral theorem that 
hold in setting. In such a context one must dis- 
tinguish between the so-called discrete and [continuous] (no corresponding eigenspace) 
spectrums, and replace the representing sum for the with some kind of an 
The definition of self-adjointness is also quite tricky for opera- 
tors. Finally, there are versions of the spectral theorem, of importance in theoretical 
quantum mechanics, that can be applied to continuous 1-parameter of com- 


muting, self-adjoint operators 
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Chapter 237 


15A66 — Clifford algebras, spinors 


237.1 geometric algebra 


Geometric algebra is a|Clifford algebra which has been used with great success in the mod- 


eling of a wide variety of physical phenomena. Clifford algebra is considered a more general 
algebraic] framework than geometric algebra. The primary distinction is that geometric al- 
gebra utilizes only as and to magnitudes. 


Let V” be an n-dimensional [vector space] over the real numbers. The geometric algebra 


Grn = 9(V") is a\graded algebra| similar to Grassmann’s [exterior algebra, except that the 


is replaced by a more fundamental multiplication operation known as the 
geometric product. For lvectors| a,b,c € V” and real scalars a,@ € R, the geometric 


product the following 


associativity: a(bc) = (ab)c a+(b+c)=(a+b)+c 
commutivity: a= pa a+B=b+a 

ab = ba a+b=b+a 

ab = į(ab + ba) + ¿(ab — ba) a+b=b+a 
distributivity: a(b + c) = ab + ac (b + c)a = ba + ca 
linearity a(b +c) = ab +ac = (b + c)a 
contraction: a? = aa =); alal =a where e; € {—1,0,1} 


Commutivity of scalar-scalar multiplication and vector-scalar multiplication is symmetric} 
however, in general, vector—vector multiplication is not commutative! The order of multipli- 


cation of vectors is significant. In particular, for parallel vectors: 
ab = ba 


and for orthogonal vectors 
ab = —ba 
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The parallelism of vectors is encoded as a symmetric property, while orthogonality of vectors 


is encoded as an property. 


The [contraction] rule specifies that the Square of any vector is a scalar equal to the sum of 
the square of the magnitudes of its in each direction. Depending on the 
contraction rule for each of the basis directions, the magnitude of the vector may be positive, 
negative, or zero. A vector with a magnitude of zero is called a null vector. 


The graded algebra 9» generated from V” is defined over a 2”-dimensional linear space. 
This basis entities for this space can be generated by successive application of the geometric 
product to the basis vectors of V” until a [closed set) of basis entities is obtained. The basis 
entites for the space are known as blades. The following multiplication table illustrates the 
generation of basis blades from the basis vectors e1, € € V”. 


€0 ey €2 €12 
€i E1 €12 €1€2 
e2 —e&12 €2 —€2e1 


e12 —€1€2 €2€, —€1€2 
Here, €, and é represent the contraction rule for e; and eg respectively. Note that the basis 


vectors of V” become blades themselves in addition to the multiplicative identity, €o = 1 


and the new bivector ej. © e,e2. As the table demonstrates, this set of basis blades is 
iclosed| under the geometric product. 


The geometric product ab is related to the a-b and the exterior product 
a/b by 


ab=a-b+aAb=b-a—bAa= 2a:-b-— ba. 
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Chapter 238 


15A69 — Multilinear algebra, tensor 
products 


238.1 Einstein summation convention 


The Einstein summation convention imply that when an [index] occurs more than once 
in the same expression, the expression is implicitly summed over all possible values for that 
index. Therefore, in to use the summation convention, it must be clear from the 
context over what |range indices should be summed. 


The Einstein summation convention is illustrated in the below examples. 


1. Let {e;}", be a lorthogonall][basis| in R”. Then the inner product] of the u = 


we = (X; ute; and v = vte; = (X; wte; is 


LU = Wo ee; 


= dju'v’. 
2. Let V be alvector space| with basis {ei}; and a dual basis! {e' 2 ,. Then, for a vector 
v = v'e; and dual vectors a = a;e’ and 8 = (;e', we have 
(a+ B)(v) = am’ + Bj! 
= (a; + biJ. 
This example shows that the summation convention is “distributive” in a natural] way. 


3. Let F : R” — R”, 2 (F!,--- , F”), and G : R” = R, y = (Gt (y), --- ,GP(y)) 


be Then 
a(Go F), ac are 
pg = aye FO) aa: 


where the right hand side is summed over k = 1,..., 7. 
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An index which is summed is called a dummy index or dummy variable. For instance, i 
is a dummy index in v’e;. An expression does not depend on a dummy index, i.e., vte; = vej. 
It is common that one most change the Mame of dummy indices. For instance, above, in 
Example 2 when we calculated u-v, it was necessary to change the index i in v = vte; to j 
so that it would not clash with u = u'e;. 


When using the Einstein summation convention, objects are usually indexed so that when 
summing, one index will always be an “upper index” and the other a “lower index”. Then 
summing should only take place over upper and lower indices. In the above examples, we 
have followed this rule. Therefore we did not write iju’ iyi = utv’ in the first example since 

u'v' has two upper indices. This isc it is not possible to take the inner product of 
two vectors without a [metric] penis is here 6,;. The last example illustrates that when we 
consider k as a “lower aa in E, then the chain rule obeys this upper-lower rule for the 
indices. 
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238.2 basic tensor 


The present entry employs the terminology and notation defined and described in the entry 
on [tensor arrays, To keep things reasonably self-contained we mention that the symbol T?4 


refers to the [vector space] of [type] (p, q) tensor arrays, i.e. 
P xII 4K 


, 


where I is some finite] list of index] labels, and where K is a Meld] 


We say that a tensor array is a characteristic array, a.k.a. a basic tensor, if all but one of its 
values are 0, and the remaining non-zero value is equal to 1. For tuples A € J? and B € I9, 
we let 


, 


eB: P xI KR 


denote the characteristic array defined by 


(emai f 1 iE Gyr) = A and (5) =B, 
Al ji-ja 0 otherwise. 


The type (p,q) characteristic arrays form a{natural)basis| for T». 


Furthermore the outer multiplication of two characteristic arrays gives a characteristic array 
of larger valence) In other words, for 


Ay € P, Bı € pai Ay = ae By = IP, 


we have that 
Bı Bə 
Ege. = © Ay Ap? 
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where the product on the left-hand side is performed by outer multiplication, and where 
A, Az on the right-hand side refers to the element of J?!*”? obtained by concatenating the 
tuples A, and Ag, and similarly for Bı Bə. 


In this way we see that the type (1,0) characteristic arrays eq), i € I (the natural basis of 
K’‘), and the type (0,1) characteristic arrays e, i € I (the natural basis of (K! )*) generate 
the tensor array \algebra| relative to the outer multiplication operation. 


The just-mentioned fact gives us an alternate way of writing and thinking about tensor 
arrays. We introduce the basic symbols 


Eli); e® | wel 
subject to the commutation [relations| 


ewe”? = een, ii EJ, 


add and multiply these symbols using coefficients in K, and use 


(i1...iq) 


Elir. jp) tig eee leg diy< sje E 


as a handy abbreviation for 

g) ba eer.) »++E(jy)- 
We then interpret the resulting expressions as tensor arrays in the obvious fashion: the values 
of the tensor array are just the coefficients of the € symbol [matching] the given index. How- 
ever, note that in the € symbols, the covariant data is written as a superscript, and the con- 


travariant data as a subscript. This is done to facilitate the Einstein summation convention 


By way of illustration, suppose that J = (1,2). We can now write down a type (1,0) 
ie. a i 
u 


u = uten) + uE). 


as 


Similarly, a row-vector 
(o) = (G1, %2) = T. 


can be written down as 
b= pie + doe. 


M1 M? 
M= 1 ) c TH 
(i an 


In the case of a 


we would write a n a as 
1 2 1 2 
M = M4 eq) + Mei + Mi Eg + Mo eq. 
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238.3 multi-linear 


Let Vi, V2,...,Vn,W be |vector spaces) over a field K. A mapping 
M:VixWx...X Vn >W 


is called multi-linear or n-linear, if M is linear in each of its arguments. 


Notes. 


e A bilinear! mapping is another name for a 2-linear mapping. 
e This definition generalizes in an obvious way to rings and 


e An excellent example of a multi-linear (map) is the operation. 
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238.4 outer multiplication 


Note: the present entry employs the terminology and notation defined and described in the 
entry on To keep things reasonably self contained we mention that the symbol 


T” refers to the {vector space] of (p,q) tensor arrays, i.e. 
Px 19K, 


where I is some {finite list of index! labels, and where K is a feld 
Let p1, P2, 91, q2 be natural numbers, Outer multiplication is a bilinear! operation 


TP1 GH x T2942 _, Tpitp2.qi+4q2 


that combines a type (pi, q,) tensor array X and a type (po, q2) tensor array Y to produce 
a type (pı + po, qı + q2) tensor array XY (also written as X & Y), defined by 


ae _ t1---dpy y7ipy4+1---tpy +p 
J1---Jq1 Jay +1-+-Jay +42 Il-Jaqy ~~ Jqyt+1+-Jay +2 


Speaking informally, what is going on above is that we multiply every value of the X array 
by every possible value of the Y array, to create a new array, XY. Quite [obviously] then, 
the [sizelof XY is the size of X times the size of Y, and the index slots of the product XY 
are just the funion| of the index slots of X and of Y. 
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Outer multiplication is a non-commutative, associative operation. The type (0,0) arrays are 
the [scalars] i.e. elements of K; they commute with everything. Thus, we can embed K into 


the 
er 


p,qEN 


and thereby endow the latter with the |structure! of an K-algebra [| 


By way of illustration we mention that the [outer] product of a [column vector} i.e. a type 
(1,0) array, and alrow vector] i.e. a type (0,1) array, gives a[matrix] i.e. a type (1,1) tensor 
array. For instance: 


a az ay az 
b| ® (x y z) = | bx by bz], a,b,c, x,y,z EK 
c Ch Cy cz 
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238.5 tensor 


Overview A tensor is the mathematical idealization of a geometric or physical quantity 
whose description, relative to a of reference, consists of an array of 
numbers H. Some well known examples of tensors in [geometry] are [quadratic forms, and the 
tensor. Examples of physical tensors are the energy-momentum tensor, and the 
polarization tensor. 


Geometric and physical quantities may be categorized by considering the degrees of freedom 
inherent in their description. The quantities are those that can be represented by a 
single number — speed, mass, temperature, for example. There are also vector-like quan- 
tities, such as [force] that require a list of numbers for their description. Finally, quantities 


such as quadratic forms naturally require a multiply indexed array for their 
These latter quantities can only be conceived of as tensors. 


Actually, the tensor notion is quite general, and applies to all of the above examples; scalars 
and \vectors|are special kinds of tensors. The feature that distinguishes a scalar from a vector, 
and distinguishes both of those from a more general tensor quantity is the number of indices 
in the representing array. This number is called the rank of a tensor. Thus, scalars are rank 
zero tensors (no indices at all), and vectors are rank one tensors. 


1 We will not pursue this line of thought here, because the topic of structure is best dealt with in 
the a more abstract context. The same comment applies to the use of the sign © in denoting 
outer multiplication. These topics are dealt with in the entry pertaining to abstract 

“Ceci n'est pas une pipe,”|as Rene Magritte put it. The{image|and the object represented by the image 
and not the same thing. The mass of a stone is not a number. Rather the mass can be described by a 
number relative to some specified [unit] mass. 
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It is also necessary to distinguish between two |types| of indices, depending on whether the 
corresponding numbers transform covariantly or contravariantly relative to a change in the 
frame of reference. Contravariant indices are written as superscripts, while the covariant 
indices are written as subscripts. The valence of a tensor is the pair (p,q), where p is the 
number contravariant and q the number of covariant indices, respectively. 


It is customary to represent| the actual tensor, as a stand-alone entity, by a bold-face symbol 
such as T. The corresponding array of numbers for a type (p,q) tensor is denoted by the 
symbol T ae where the superscripts and subscripts are indices that vary from 1 to n. This 
number n, the range) of the indices, is called the [dimension] of the tensor. The total degrees 
of freedom required for the [specification] of a particular tensor is a {power of the dimension; 
the exponent) is the tensor’s rank. 

-tq 


Again, it must be emphasized that the tensor T and the representing array T a are not 


the same thing. The values of the representing array are given relative to some frame of 


reference, and undergo a linear transformation! when the frame is changed. 


Finally, it must be mentioned that most physical and geometric applications are concerned 
with tensor that is to say tensor valued [functions, rather than tensors themselves. 
Some care is required, because it is common to see a tensor field called simply a tensor. 


There is a however; the entries of a To are numbers, whereas 
the entries of a tensor field are functions. The present entry treats the purely 
aspect of tensors. Tensor field concepts, which typically involved [derivatives] of some kind, 


are discussed elsewhere. 


Definition. The formal definition of a tensor quantity begins with a 
U, which furnishes the “building blocks” for tensors of all valences. 
In typical applications, U is the [tangent space) at a point of a [manifold the elements of U 


represent velocities and forces. The space of (p,q)-valent tensors, denoted here by UW! is 


obtained by taking the of p copies of U and q copies of the dual vector space 
U*. To wit, 


p times q times 


—_—_—_—_—_ E E 
UWP =UQ... HUSU... QU". 


Inlorderlto represent a tensor by a concrete array of numbers, we require a frame of reference, 
which is essentially a|basis| of U, say 


€1,...,€, E U. 


Every vector in U can be “measured” relative to this basis, meaning that for every v € U 
there exist unique scalars vt, such that (note the use of the|Einstein summation convention) 


V=U'G. 


These scalars are called the components] of v relative to the frame in question. 
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Let e!,...,e” € U* be the corresponding [dual basis}, i.e. 
e'(e;) = i; 


where the latter is the array. For every a E U* there exists a 
unique array of components a; such that 


Q = Q;e. 


More generally, every tensor T € UP has a unique representation in [terms] of components. 


That is to say, there exists a unique array of scalars T aes such that 


T= 7 ei @...@e;,, @e" @...@e”. 


Transformation rule. Next, suppose that a change is made to a different frame of refer- 
ence, say 

êl... ên E U. 
Any two frames are uniquely related by an invertible Ai, having the 
that for all values of 7 we have 


6; = A ej. (238.5.1) 


Let v € U be a vector, and let vê and 6’ denote the corresponding component arrays relative 
to the two frames. From 
v= v'e; = 0'6;, 
and from (2385.1) we infer that 
o = Bi vi, (238.5.2) 
where B}, is the [matrix inverse] of A’, i.e. 


j 


Thus, the rule for a vector’s components (38.5.2) is contravariant to the 
transformation rule for the frame of reference (238.5.1). It is for this reason that the super- 
script indices of a vector are called contravariant. 


To establish (238.5.2), we note that the transformation rule for the dual basis takes the form 


ê = Br e, 
and that 

v = e (v), 
while l l 

© = ê (v) 


The transformation rule for covector components is covariant. Let œ € U* be a given 
covector, and let a; and â; be the corresponding component arrays. Then 


Ôj = A’ ai. 
The above [relationlis easily established. We need only remark that 
a; = a(e;), 


and that 
âj = a(€;), 


and then use (238.5.1). 


In light of the above discussion, we see that the transformation rule for a general type (p, q) 
tensor takes the form 


Sirig gir... ata pl... pl mmki--ka 
Pe SAA Bee Ber, 


Ji---Ip ly..lp 


Version: 6 Owner: rmilson Author(s): rmilson, djao 


238.6 tensor algebra 


Let R be avring, and M an R-module. The tensor algebra J(M) is a graded R-algebra with 


n-th graded [component] J (M) simply the nth tensor|power, M®” of M, and multiplication 
ab = ab E Tnim(M) for a € T,(M),b E€ Tm( M). Thus 


J(M) =@I,.(M)=ROMOM@M@e.-- 
n=0 


J is a [functor] and the to the forgetful functor from R-algebras to R-modules. That 
is, every module homorphism M — S where S is a R-algebra, extends to a unique R-algebra 


homorphism J(M) > S. 
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238.7 tensor array 


Introduction. Tensor arrays, or tensors for short [J are multidimensional arrays with two 
ypes|of (covariant and contravariant) indices. Tensors are widely used in science and math- 


ematics, because these data|structures| are the natural choice of for a variety 


of important physical and geometric quantities. 


3 The word tensor has other meanings, c.f. the 
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In this entry we give the definition of a tensor array and establish some related terminology 
and notation. The [theory] of tensor arrays incorporates a number of other essential topics: 
basic tensors| tensor transformations) outer multiplication) contraction inner multiplication, 
and generalized \transposition| These are fully described in their separate entries. 


Valences and the space of tensors arrays. Let K bea field and let I be alfinitel list 
of indices Ë| such as (1,2,...,n). A tensor array of type 


(p,q), DaEN 


is a mapping 
I? x 114K. 


The set of all such mappings will be denoted by T”4(I, K), or when J and K are clear from 
the context, simply as T??. The numbers p and q are called, respectively, the contravariant 
and the covariant valence of the tensor array. 


Point-wise addition and scaling give T?’ the [structurel of a a vector space of dimension 


n?*4, where n is the of I. We will interpret I° as signifying a [singleton] set. 
Consequently T?° and T°? are just the maps] from, respectively, I” and I? to K. It is also 


customary to identify T!° with KŻ, the vector space of [list vectors] indexed by J, and to 
identify T°! with (IK! )" of {linear forms) on K!. Finally, T”? can be identified 
with K itself. In other words, scalars] are tensor arrays of zero valence. 


Let X : IP xI" — K bea type (p, q) tensor array. In writing the values of X, it is customary to 
write contravariant indices using superscripts, and covariant indices using subscripts. Thus, 
for indices 71,...,%p,J1,---5Jq E J we write 


xt 


ji-ja 


instead off 
X (i, waia stp Ii ES , Jp). 


We also mention that it is customary to use |columns| to{represent| contravariant index dimen- 
sions, and [rows] to represent the covariant index dimensions. Thus [column vectors] are type 
(1,0) tensor arrays, are type (0,1) tensor arrays, and matrices, in as much as 
they can be regarded either as rows of columns or as columns of rows, are type (1, 1) tensor 
arrays 


4 Tn physics and differential [geometry] K is typically R or C. 
5 Tt is advantageous to allow general indexing sets, because one can indicate the use of multiple frames 


of reference by employing multiple, sets of indices. 
€ Curiously, the latter notation is preferred by some authors. See H. Weyl’s books and papers, for example. 
T Tt is also customary to use matrices to also represent type (2,0) and type (0,2) tensor arrays (The latter 
are used to represent quadratic forms,) Speaking idealistically, such objects should be typeset, respectively, 
as a column of column vectors and as a row of row vectors. However typographical constraints and notational 
convenience dictate that they be displayed as matrices. 
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Notes. It must be noted that our usage of the tensor array is non-standard. The 
traditionally inclined authors simply call thse data structures tensors. We bother to make 
the distinction because the traditional nomenclature is ambiguous and doesn’t include the 
modern mathematical understanding of the tensor concept. (This is explained more fully 
in the {tensor entry|) Precise and meaningful definitions can only be given by treating the 
concept of a tensor array as distinct from the concept of a geometric/abstract tensor. 


We also mention that the term tensor is often applied to objects that should more appro- 
priately be termed a tensor field. The latter are tensor-valued functions, or more generally 
lsections| of a tensor bundle. A tensor is what one gets by evaluating a tensor field at one 
point. Informally, one can also think of a tensor field as a tensor whose values are functions, 


rather than constants 
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238.8 tensor product (vector spaces) 


Definition. The classical conception of the tensor product operation involved |finite dimensional 


‘vector spaces|A, B, say over alfield)|K. To describe the tensor product A® B one was obliged 
to chose 


aGA.1el, bees,jed 
of A and B indexed by sets J and J, respectively, and represent) elements of a € A 
and b € B by their coordinates relative to these bases, i.e. as a: I — K and 


b: J — K such that 
a= Sala, b= X_ bb}. 


iEl jEJ 


One then represented A & B relative to this particular choice of bases as the vector space of 
mappings c: I x J — K. These mappings were called “second-order contravariant tensors” 
and their values were customarily denoted by superscripts, a.k.a. contravariant indices: 


ÄH eK, icl, jeJ. 
The canonical [bilinear multiplication (also known as [outer multiplication) 
Q:AxBo-AQB 
was defined by representing a ® b, relative to the chosen bases, as the [tensor] 
# = db, iel, jeJ. 


In this system, the products 
a; Q b,, iEl, jeJ 
were represented by basic tensors, specified in terms) of the Kronecker deltaslas the mappings 


(i), Fer, sed, 
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These gave albasis|of A @ B. 

The construction is [independent] of the choice of bases in the following sense. Let 
ac A, icf, b EB, jeJ 

be different bases of A and B with indexing sets J’ and J’ respectively. Let 


r:IxI'—>K, s:Jx J => K, 


be the corresponding change of basis matrices! determined by 


a= > (rija, el 


One then stipulated that tensors c: J x J — K and œ : I’ x J’ — K represent the same 


element of A ® B if l 
c= E O (2388.1) 


for all i € I, j € J. This relation) corresponds to the fact that the products 
/ / S 1 $ $ 
agb, iEľ, jeJ 
constitute an alternate basis of A & B, and that the change of basis relations are 


a, @bi, = X (ri) (si, )a@b;, el’, jeJ. (238.8.2) 


wer! 
jles! 


Notes. Historically, the tensor product was called the [outer] product, and has its origins in 
the absolute differential calculus (the [theory] of manifolds). The old-time tensor calculus is 
difficult to understand because it is afflicted with a particularly lethal notation that makes 
coherent ‘comprehension all but impossible. Instead of talking about an element a of a vector 
space, one was obliged to contemplate a symbol a’, which signified a list of real numbers| 
indexed by 1,2,...,n, and which was understood to represent a relative to some specified, 
but unnamed basis. 


What makes this notation truly lethal is the fact a symbol af was taken to signify an alternate 
list of real numbers, also indexed by 1,...,, and also representing a, albeit relative to a 
different, but equally unspecified basis. Note that the choice of dummy variables make all 
the | difference, Any sane system of notation would regard the expression 


as representing a list of n symbols 


However, in the classical system, one was strictly forbidden from using 


because where, after all, is the all important dummy variable to indicate choice of basis? 


Thankfully, it is possible to shed some light [onto] this confusion (I have read that this is 
credited to Roger Penrose) by interpreting the symbol a’ as a mapping from some finite 
lindex|set 7 to R, whereas a’ is interpreted as a mapping from another finite index set J (of 


equal |cardinality) to R. 


My own surmise is that the Source) of this notational difficulty stems from the reluctance of 
the ancients to deal with geometric objects directly. The prevalent superstition of the age 
held that in/order| to have meaning, a geometric entity had to be measured relative to some 
basis. Of course, it was understood that geometrically no one basis could be preferred to any 
other, and this leads directly to the definition of geometric entities as lists of measurements 


modulo the equivalence] engendered by changing the basis. 


It is also worth remarking on the contravariant nature of the relationship between the actual 
elements of AQ B and the corresponding representation] by tensors relative to a basis — com- 
pare equations (1) and (2). This relationship is the source of the terminology “contravariant 
tensor” and “contravariant index”, and I surmise that it is this very medieval pit of darkness 
and confusion that spawned the present-day notion of “contravariant functor”. 


References. 
1. Levi-Civita, “The Absolute Differential Calculus.” 
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238.9 tensor transformations 


The present entry employs the terminology and notation defined and described in the entry 


on and To keep things reasonably self contained we mention 
that the symbol T” refers to the {vector space) of type] (p, q) tensor arrays, i.e. 


PxI =K, 


where J is some finite) list of index]labels, and where K is a|field! The symbols as eo. gel 
refer to the [column] and {row vectors) giving the of T!° and T°, respectively. 


Let J and J be two finite lists of equal [cardinality] and let 
T:K' —K’ 
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bea linear [isomorphism] Every such isomorphism is uniquely represented by an invertible 
M:JxI—-K 


with entries given by l 
Mi=(Tew)’, t€1, 7 €4. 


In other words, the action of T is described by the following substitutions 


eq rd Mieg, iE. (238.9.1) 
jEJ 
Equivalently, the action of T is given by matrix-multiplication of column vectors) in K! by 


M. 


The corresponding SE a re cal the type (0, 1)ltensorslinvolve the inverse|matrix 
M-!:Ix J — K, and take the form Ë 


eO NO (M) O, iel. (238.9.2) 


jet 


The rules for type (0,1) substitutions are what they are, because of the requirement that 
the €q) and eÙ remain dual [bases] even after the substitution. In other words we want the 
substitutions to preserve the relations 

eVeq) =d3, t,t € I, 
where the left-hand side of the above equation features the |inner product) and the right-hand 
side the Kronecker delta! Given that the [vector] basis transforms as in (38.9.1) and given 
the above constraint, the substitution rules for the linear form) basis, shown in (238.9.2), are 
the only such possible. 


The classical terminology of contravariant and covariant indices is motivated by thinking in 
term) of substitutions. Thus, suppose we perform a linear substitution and change a vector, 
i.e. a type (1,0) tensor, X € K! into a vector Y € K’. The indexed values of the former 
and of the latter are related by 


Vay MY. ger. (238.9.3) 


ie. 


Thus, we see that the “transformation rule” for indices is contravariant to the substitution 


rule (238.9-1) for basis vectors. 
8 The above relations describe the action of the dual homomorphism of the inverse |transformation| 
(r): @) > R)" 
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In modern terms, this contravariance is best described by saying that the space 
construction is a contravariant functor] In other words, the substitution rule for the linear 


forms, i.e. the type (0,1) tensors, is contravariant to the substitution rule for vectors: 
EDS Mic, jel, (238.9.4) 
iel 


in full agreement with the relation shown in (2389.2). Everything comes together, and 
equations (38.9.3) and (238.9.4) are seen to be one and the same, once we remark that 
tensor array values can be obtained by contracting with characteristic arrays. For example, 


Xt=e%X), tel, Yr=e%V), jes 


Finally we must remark that the transformation rule for covariant indices involves the inverse 
matrix M~t. Thus if a € T®1(J) is transformed to a 3 € T°! the indices will be related by 


B= >> (M) an jEJ 
wel 


The most general transformation rule for tensor array indices is therefore the following: the 
indexed values of a tensor array X € T”4(I) and the values of the transformed tensor array 
Y € T”4(J) are related by 


Yee = D MB. Mg (MOE (MYE Xe, 


1 ...1g il ip ly lq 
Uy yey ip €IP 
ky ,...,kq E14 
for all possible choice of indices j1, ... Jp, li,..., lg € J. Debauche of indices, indeed! 
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Chapter 239 


15A72 — Vector and tensor algebra, 
theory of invariants 


239.1 bac-cab rule 


The bac-cab rule|states| that for [vectors A, B, and C (that can be either [reall or complex) 


in RÌ, we have 


A x (Bx C) =B(A.-C)—C(A-B). 
Here x is the cross product] and - is the real {inner product] 
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239.2 cross product 


The cross product of two is a vector to the plane of the two vectors 
being crossed, whose magnitude is equal to the area of the |parallelogram| defined by the two 
vectors. Notice there can be two such vectors. The cross product produces the vector that 
would be in a right-handed coordinate system with the plane. It is exclusively for use in Rê 
as you can see from the definition. We write the cross product of the vectors @ and bas 


P i jk i i . 
@xb=det | a, a, ag | =det| © B |i—det| @ ® |F4det|  % |é 
by bs bi bs bi be 
by b2 bs 


with @ = aî + aaj + azk and b= ba + boj + bak, where {1, 9, k} is any right-handed basis] 
for R3. 

Note that @ x b = —b x @. 

Properties of the Cross Product: 
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e From the expression for the area of a parallelogram we obtain |@ x b| = |a||b| sin 6 where 
0 is the angle between đ and b. 


e From the above, you can see that the cross product of two parallel vectors, or a vector 
with itself, would be 0, assuming neither vector is 0 since sin 0 = 0. 
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239.3 euclidean vector 


A euclidean vector is an geometric entity that has the properties of magnitude and direction. 
For example, a[vector|in R? can be represented by its components like this (3, 4) or like this 


| |. This particular vector can be represented geometrically like this: 


In R”, a vector can be easily constructed as the line|segment) between points whose difference 
in each coordinate are the components of the vector. A vector constructed at the origin (The 
vector (3,4) drawn from the point (0,0) to (3,4)) is called a position vector. Note that a 
vector that is not a position vector is independent of position. It remains the same vector 
unless one changes its magnitude or direction. 

Magnitude: The magnitude of a vector is the distance from one end of the line segment to 
the other. The magnitude of the vector comes from the [metric] of the space it is embedded 
in. It’s magnitude is also referred to as the In this can be 


gotten using |Pythagorean’s theorem in R” such that for a vector Y € R”, the length |«| is 
such that: 


This can also be found by the |dot product] vä- d. 


Direction: A vector basically is a direction. However you may want to know the angle 
it makes with another vector. In this case, one can use the dot product for the simplest 
computation, since if you have two vectors @ ,b, 


-b = |a||b| cos 0 


Since @ is the angle between the vectors, you just solve for 0. Note that if you want to find 
the angle the vector makes with one of the axes, you can use trigonometry. 

In this case, the length of the vector is just the hypotenuse] of a single |right triangle} and the 
angle @ is just arctan 4, the arctangent of it’s components. 

Projection, resolving a vector into its components:Say all we had was the magnitude 
and angle of the vector to the x-axis and we needed the components.(Maybe we need to do 
alcross product or addition). Look at figure a. again. The x component of the vector can be 
likened to a ” plumb line” dropped from the arrowhead of the vector to the x-axis so that it 
is perpendicular to the x-axis. Thus, to get the x-component, all we would have to do is 


1073 


multiply the magnitude of the vector by the [cosine] of 0. This is called resolving the vector 
into its components. 

Now say we had another vector at an angle 0) to our vector. We construct a line between 
the arrow of our vector and intercept the other vector in the same way, perpendicularly. The 
length from the tail of the vector to our interception is called the projection of our vector 
onto the other. Note that the equation remains the same. The is still dcos@2. This formula 
is valid in all higher dimensions! as the angle between two vectors takes place on a plane. 
Miscellaneous Properties: 

- Two vectors that are parallel to each other are called “collinear” ,) as they can be drawn 
onto the same line. Collinear vectors are [scalar] multiples of each other. 

- Two non-collinear vectors are coplanar. 

- There are two main types of products between vectors. The dot product (scalar product), 
and the cross product (vector product). There is also a commonly used combination called 


the triple scalar product 
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239.4 rotational invariance of cross product 


Theorem 
Let R be a rotational 3 x 3{matrix, i.e., a{real] matrix with det R = 1 and R7! = RT. Then 
for all fvectors| u, v in R3, 


R. (ux v) =(R-u) x (R-v). 


Proof. Let us first some right hand oriented in R. Further, let 
{u}, u’, u?} and {v}, v?, v?} be the [components of u and v in that |basis| Also, in the chosen 


basis, we denote the entries of R by Rij. Since R is rotational, we have Ri; Rp; = di, where 
ik is the Kronecker deltalsymbol. Here we use the Einstein summation convention, Thus, in 


the previous expression, on the left hand side, 7 should be summed over 1, 2,3. We shall use 


the [Levi-Civita permutation symbol] ¢ to write the cross product) Then the i:th [coordinate] 


of u x v equals (u x v} = e¥*®uJu*. For the kth component of (R. - u) x (R v) we then have 


((R-u)x(R-v))® = &™RyRpnuiv” 
= e" yR; Rmn" 
= e™ Rer RRi; Rmnut v” 
= &™ det R Reuto”. 
The last line follows since €4* Rim Rijn Rpr = €" EO" Ra RjaRps = e” det R. Since det R = 
1, it follows that 


((R-u) x (R-v))* = Ree utv” 
= Ala xv) 
= (R-uxv)* 
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as claimed. 
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Chapter 240 


15A75 — Exterior algebra, Grassmann 
algebras 


240.1 contraction 


Definition Let w be a k-form on a|smooth manifold M, and let € be a smooth 


on M. The contraction of w with € is the smooth (k — 1)-form that maps} 
x E€ M to w,(&,-). In other words, w is point-wise evaluated with € in the first slot. We 
shall denote this (k — 1)-form by w. If w is a 0-form, we set tw = 0 for all £. 


Let w and € be as above. Then the following properties hold: 


1. For any k 


brew = keew. 


2. For vector fields € and 7 


lept = lew + ty, 
LeljW = —bylew, 
lelew = 0. 


3. Contraction is an anti-derivation [I]. If wt is a p-form, and w? is a q-form, then 


relw Aw?) = (gw) Aw? + (1)? wt A (egw?) 
REFERENCES 
1. T. Frankel, Geometry of physics, Cambridge University press, 1997. 
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240.2 exterior algebra 
Let V be alvector space) over a K. The exterior algebra over V, commonly denoted by 
A(V), is an [associative A-algebra with [unit] together with a linear [injection] 
A:V > A(V), 
which is characterized, up to by the given below. Note 


that the exterior algebra product operation is most commonly denoted by a wedge symbol: 
A^. Also note that the accepted convention is to identify v € V with its [image] \(v) € A(V), 
i.e. we don’t bother writing A(v) and just write v instead. 


e The exterior product is in the sense that for all v € V we have 


vAv=0. 
e Let A be an associative K-algebra with unit. Every K-linear homomorphism) 
@:V—7A 


such that 
o(v)o(v) =0, vEeV 


lifts to a unique K-algebra homomorphism 


@:AV)7A 
such that - 
ov) = 9(A(v)), vev 
Diagramatically: 
V—>A(V) 
NN atg 
y 
A 


So much for the abstract definition. It is concise, but does little to illuminate the nature of 
the elements of A(V). To that end, let us say that an a € A(V) is k-primitive if there exist 
U1,---,Up E V such that 
Q = Vi A A Vz, 

and say that an element of A(V) is k-homogeneous if it is the of k- 
primitive elements. We then define A°(V) to be the jspan]of the unit element, define At (V) 
to be the image of canonical injection \, and for k > 2 define A*(V) to be the vector space 
of all kK-homogeneous elements of A(V). 
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Proposition 12. The above grading gives A(V) the\structure of anti-symmetrically N-graded 
To be more precise, 


AV) =V), 
k=0 
such that if a € MI(V) and B € AF(V), then 
ah BE ATV) 


with 
BNa=(-1)*an 8. 


Proof. The essence of the proof is that we construct a [modell of the exterior algebra, and 
then use the universality properties of A(V) to show that it is isomorphic to that model. To 
that end, for k € N let V®* denote the k-fold tensor product) of V with itself. Note: it must 
be understood that V®° = K and that V®! = V. For k > 2 we let R* be the span of all 
elements 
ne... Dup E VF, 
such that 
= al 


for some j = 1,...,k — 1 We then set 
ay /R* 


and set = 
E= P EF. 
k=0 


It can be easily shown that the tensor product multiplication 
9: VSI x Vk — V+) 


[factors| through to the quotient and gives a well-defined associative, anti-symmetric product 
on E. From there, the universality properties of the tensor product and the universality 
properties of A(V) imply that Æ and A(V) are isomorphic algebras. Q.E.D. 


In the case that V is a [finite-dimensional] vector space one can give some more “down-to- 
earth” definitions that may be helpful in understanding the nature of the exterior product. 
Suppose then, that V is n-dimensional, and let V* denote the of 
Note that (V*)®* is naturally identified with the vector space of 
V* — R. It therefore makes sense to identify A*(V*) with the vector space of anti-symmetric, 
multi-linear mappings V* — R. Next, we define the alternation [operator] 


Aga lV A) 
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as follows. For a € (V*)®* we define A;(a) € A*(V*) by 


1 
Arla) (v, 2.440%) = FL sgn(r)a(vn, oee Un h Utero te EV 


where the right-hand sum is taken over all of {1,..., k}, and sgn(r) = +1 
according to the of the permutation. Let us also note that A; restricts to the 
on A*(V*). Finally for a € AJ(V*), and @ € A*(V*) we define 


Q AB = Aj+n (a 88), 


Proposition 13. The above wedge product is an associative product on 
pav») 
k=0 

and makes the latter a model of A(V*), the exterior algebra on V*. 


A description of the exterior algebra in [terms] of a [basis] may also be useful. Therefore, let 
€1,---,€n be a basis of V. For every pequence] I = (i1,..., ip) of natural numbers! between 1 
and n let ez denote the primitive element e; A^... A €i. If I is the empty sequence, we use 
er to denote the unit element of the exterior algebra. Note that 


er =0 


if and only if J contains) duplicate entries. For a permutation 7 of {1,...,k} let w(I) denote 
the sequence (7;,,...,7j,), and note that 


€n(1) = Sgn(7)er. 


Proposition 14. The exterior algebra A(V) is a 2” dimensional vector space with basis {er}, 


where the\index I runs over all|strictly increasing sequences — including the empty sequence 


— of natural numbers between 1 and n. 


The upshot of all this is that for finite-dimensional vector spaces we have another way to 
construct a model of the exterior algebra. Namely, we choose a basis e€1,...,€n an define 
formal bi-vector symbols e; A ej subject to the anti-symmetric 


e,\e,=0, and e,Ae; = —e; ^ei. 


We then define tri-vector symbols, and more generally k-vector symbols subject to the ob- 
vious k-place anti-symmetric relations. The exterior algebra A(V) is then defined to be the 
vector space of all possible linear combinations of the k-vector symbols, and the algebra 
product is defined by linearly extending the wedge product to all of A(V). Also note that 
for k > n all k-vector symbols are identified with zero, and hence that A*(V) = {0} for all 
k>n. 
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Notes. The exterior algebra is also known as the Grassmann algebra after its inventor 


(Hermann Grassmann| who created it in order to give algebraic| treatment of linear 
Grassman was also one of the first people to talk about the geometry of an n-dimensional 
space with n an arbitrary natural number. The axiomatics of the exterior product are needed 


to define and therefore play an essential role in the [theory] of integration 
on{manifolds, Exterior algebra is also an essential prerequisite to understanding DeRham’s 


theory of differential 
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Chapter 241 


15A99 — Miscellaneous topics 


241.1 Kronecker delta 


The Kronecker delta 6;; is defined as having value 1 when i = j and 0 otherwise (i and j are 


integers). It may also be written as 0” or ĝi. It is a special case of the generalized Kronecker delta symbo 


The delta symbol was first used in print by Kronecker in 1868[]]. 
Example. 


The n x n I can be written in {terms] of the Kronecker delta as simply the 
lmatrix| of the delta, I; = 6;;, or simply I = (6,;). 


REFERENCES 


1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and 
Applied Mathematics, 1998. 
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241.2 dual space 


Dual of a|vector space} dual 


Let V be a vector space over alfield/k. The dual of V, denoted by V*, is the vector space of 
linear forms! on V, i.e. [linear mappings! V — k. The operations in V* are defined pointwise: 


(p+ ¥)(v) = plv) + Yl) 
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(Av) (v) = Av(v) 
for Ac K,veV and y, 4 E V*. 


V is [isomorphic] to V* if and only if the dimension) of V is finite) If not, then V* has a larger 
(infinite) dimension than V; in other words, the|cardinal of any basis| of V* is strictly greater 
than the cardinal of any basis of V. 


Even when V is finite-dimensional, there is no canonical or natural isomorphism V — V*. 
But on the other hand, a basis B of V does define a basis B* of V*, and moreover a bijection 


B — B*. For suppose B = {b,,...,b,}. For each i from 1 to n, define a 
BiVok 
by 
Bid, Tpd,) = i . 
k 


It is easy to see that the Ø; are nonzero elements of V* and are independent, Thus 
{01,..-,; Bn} is a basis of V*, called the dual basis of B. 


The dual of V* is called the second dual or bidual of V. There is a very simple canonical 
injection] V — V**, and it is an isomorphism if the dimension of V is finite. To see it, let x 
be any element of V and define a mapping x’ : V* — k simply by 


x’ is linear by definition, and it is readily verified that the mapping x +> x’ from V to V** is 
linear and injective. 


Dual of a topological vector space 


If V is a topological vector space, the continuous dual V’ of V is the subspace] of V* 
consisting of the lcontinuousl linear forms. 


A normed vector space V is said to be reflexive if the natural embedding V —> V” is an 


isomorphism. For example, any finite dimensional space is reflexive, and any Hilbert space 


is reflexive by the Riesz representation theorem. 

Remarks 

Linear forms are also known as linear functionals. 

Another way in which a linear mapping V — V* can arise is via a bilinear forml 
VxV >k. 


The notions of duality extend, in[part}, from vector spaces to{modules|, especially {free modules) 
over A related notion is the duality in projective spaces 
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241.3 example of trace of a matrix 


2 4 6 9 8 7 
Lett A= | 8 10 12] andB={6 5 4 | then: 
14 16 18 3 2 1 


e trace(A + B) 
= trace(A) + trace(B) 
= (2+10+18)+(9+5+1) 
= 45 


e trace(A) 
= trace(cA’) 
= c - trace( 
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241.4 generalized Kronecker delta symbol 


Let l and n belnatural numberslsuch that 1 < 1 < n. Further, let i; and j, be natural numbers 
in {1,---,n} for all k in {1,--- ,/}. Then the generalized Kronecker delta symbol, 
denoted by ðf, is zero if ip = is Or jp = js for some r Æ s, or if {i1,--- i} Æ {jne ja} 
as sets. If none of the above conditions are met, then 6/1: is defined as the sign of the 


ILR 
permutation] that maps ii -ir to j -++ jr 


From the definition, it follows that when l = 1, the generalized Kronecker delta symbol 
reduces to the traditional delta symbol bi. Also, for 1 = n, we obtain 


iein — pirvino, 
ÔF -jn = € Ej jn 
Tn — . F 
Og iotey = Ejijns 
where €;,...;, 18 the Levi-Civita permutation symbol 
For any l we can write the generalized as aldeterminant| of traditional delta 
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symbols. Indeed, if S(/) is the permutation group] of / elements, then 


iei . tra) sîr) 
Ona, = J sign T 0," Ô; 
TES(L) 
ay i 
OF OF 
= det] : ; 
a ù 
Oi Oi 


The first equality follows since the sum one the first line has only one non-zero the 
term for which 7,(,) = Jẹ- The second equality follows from the definition of the determinant. 
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241.5 linear functional 
Let V be alvector space) over afield] A. A linear functional on V is a linear transformation] 
o:V — K, where K is thought of as a one-dimensional vector space over itself. 


The of all linear functionals on V can be made into a vector space by defining 
addition and [scalar] multiplication {pointwise} it is called the of V. 
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241.6 modules are a generalization of vector spaces 
A |modulel is the [natural] generalization of a [vector space| in fact, when working over a [field] 
it is just another word for a vector space. 


If M and N are R-modules then a f: M — N is called an R-morphism (or 
homomorphism) if: 


Ve, yEM: f(x +y) = f(£)+ fly) and Vee MVAER: f(x) =Af(z) 


Note as mentioned in the beginning, if R is a field, these are the defining properties 
for a 


Similarly in vector space terminology the {image|Imf := {f(2) : x € M} and|kernell Kerf := 
{x € M : f(x) = On} are called the {range and null-space respectively. 
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241.7 proof of properties of trace of a matrix 


Proof of 


1. Let us check linearity. For sums we have 


trace(A+ B) = (aii + bii) (property of matrix addition) 


M: 


i=1 
= ` Aig + 2 bii (property of sums) 
i=1 i=1 


= trace(A) + trace(B). 


Similarly, 


trace(cA) = bP C-aii (property of matrix scalar multiplication) 
i=l 
c- y aii (property of sums) 
i=1 
= c-trace(A). 


2. The second property follows since the transpose] does not alter the entries on the main 
diagonal. 


3. The proof of the third property follows by exchanging the summation [order] Suppose 
A isan x mimatrix| and B is am x n matrix. Then 


trace AB = 5 5 AijBji 


i=1 j=1 


` ` B;iAij (changing summation order) 


j=1 i=1 
= trace BA. 
4. The last property is a consequence of Property 3 and the fact that matrix multiplication 


is associative’ 


trace(B-'AB) = trace ((B~'A)B) 
= trace (B(B~'A)) 

(BB) A) 
= trace(A). 
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241.8 quasipositive matrix 
A [square matrix| A is a quasipositive matrix if it is nonnegative except perhaps on its main 
diagonal, i.e., a; > 0 for i Æ j. 
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241.9 trace of a matrix 


Definition 


Let A = (a;,;) be a or possibly of [dimensioni n. The ftrace] of 
the [matrixlis the sum of the main diagonal: 


trace( A) = J a;i 
i=l 


1. The trace is a linear transformation from the space of square matrices to the real 
numbers. In other words, if A and B are square matrices of same dimension and c is 


a\scalar, then 


trace(A + B) trace(A) + trace(B), 
trace(cA) = c-trace(A). 


2. For the {transpose and [conjugate transpose, we have for any square matrix A, 
trace(A‘) = trace(A), 
trace(A*) = trace(A). 
3. If A and B are matrices such that AB is a square matrix, then 
trace(AB) = trace( BA). 
Thus, if A, B,C are matrices such that ABC is a square matrix, then 


trace(ABC) = trace(CAB) = trace( BCA). 


4. If B is in invertible square matrix of same dimension as A, then 
trace(A) = trace(B~'AB). 
In other words, the trace of are equal. 
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5. Let A be asquarejn x n matrix with real (or complex) entries a;;. Then 


trace A*A = trace AA“ 


n 
= lea 


i,j=1 


Here * is the complex conjugate, and |- | is the complex modulus, In particular, 
trace A*A > 0 with equality if and only if A = 0. (See the Frobenius matrix norm. ) 


6. Various linequalities| for trace are given in [2]. 


See the proof of properties of trace of a matrix 


REFERENCES 


1. The Trace of a Square Matrix. Paul Ehrlich, [online] 


http://www.math.ufl.edu/ ehrlich/trace.html 


2. Z.P. Yang, X.X. Feng, A note on the trace inequality for products of Hermitian matrix 
power, Journal of Inequalities in Pure and Applied Mathematics, Volume 3, Issue 5, 2002, 
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